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Chapter 1 

Introduction 


1.1 Introduction 

Equations are used in applied sciences and engineering to characterize the state of 
mechanical, economics, physical, environmental, and other systems. For a long time 
it has been assumed that the coefficients of, the input to, and the end conditions 
for these equations are deterministic and perfectly known. Yet, owing to inherent 
variability and/or incomplete knowledge, the entries of most state equations are 
uncertain. Equations with deterministic, perfectly known entries are called deter- 
ministic equations (DEs); those with random entries are called stochastic equations 
(SEs). Although DEs are useful in many cases, they may provide limited information 
even on the trend of a system state. In contrast, SEs can capture both the trend and 
variability of a system state. 

The study of SEs expanded significantly during the last decades in both mathe- 
matical and applied literatures. The class of stochastic equations in the mathematical 
literature is a rather small subset of that encountered in applications. The thrust of 
most mathematical studies is on the establishment of conditions for the existence 
and uniqueness of the solution of SEs [1—7]. On the other hand, the focus of most 
applied studies is the development of accurate and efficient methods for calculating 
solution statistics. Technical details related to the existence and uniqueness of the 
solution of these equations are frequently overlooked. 

The main objective of the book is to foster interactions between engineers, sci- 
entists, and mathematicians with a view to promote the development of accurate 
and efficient methods for solving SEs based on rigorous mathematical arguments. 
It is hoped that these interactions will introduce engineers and scientists to rigorous 
mathematical arguments and apprise mathematicians of novel theoretical problems 
rooted in applications. 

Following are examples of DEs encountered in various applied fields together 
with their stochastic counterparts. Potential limitations of DEs are illustrated by a 
simple mechanical system. Let U(x ) denote the displacement function for a beam 
with length 1 > 0 fixed at the left end and free at the right end, a simple mechanical 
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system. This function satisfies at equilibrium the ordinary differential equation 
d 2 U ( x ) 

K (x) = -M(x), x e (0, /), (1.1) 

dx z 

with boundary conditions U (0) = Oand U'( 0) = 0, where K(x) is the beam stiffness, 
M(x) = — fj(t; — x)Q(£;)di; denotes the bending moment, and Q(x) is the action 
on the beam. The price X(t) of a stock at time t and its evolution in time, a commonly 
used descriptor in equity market, can be modeled by an ordinary differential equation 
of the type 

dX(t ) 

= cX(t) + aX{t) (“noise”) (t), t > 0, (1.2) 

dt 

where the constants c and a are mean return rate and volatility, respectively, and 
(“noise”)(f ) denotes a function of time capturing market fluctuations ([6], Chap. 10). 
The pollutant concentration U(x, t) at location x e D and time t in a medium with 
permeability E(x) under a flux W(x,t), a metric of great environmental interest, 
satisfies the partial differential equation 

dU(x, t) , . 

— = V ■ (i;(x)V£/(x, o) + W(x, f), X e D, t> 0, (1.3) 

accompanied by initial and boundary conditions, where V denotes the differential 
operator (3/3xi , . . . , 3/3x 4 ) and D is bounded subset in d < 3. 

Analytical solutions for differential equations are only possible in simple cases. 
In most applications, these equations need to be solved by finite difference, finite 
element, or other numerical methods that involve space and/or time discretization. 
The equations generated by numerical methods differ from the original differential 
equations. For example, the finite difference approximation of (1.1) is an algebraic 
equation of the form 


AV + C = 0, (1.4) 

where V is a vector collecting values of U(x) at a finite number of points in the 
domain of definition of this equation, A is a square matrix whose entries depend on 
beam stiffness and boundary conditions, and C collects values of M(x) at a finite 
number of spatial coordinates. The discrete version of (1.2) is a recurrence formula 
giving future stock values as a function of their current values and market volatility. 
The finite difference approximation of the partial differential equation in (1.3) is the 
ordinary differential equation 

dV(t ) 

— ^ = AV(t) + C(t), (1.5) 

dt 

where A depends on medium permeability E (x) and boundary conditions, and the 
coordinates of V(t) and C(t) are values of U(x, t) and W(x,t) at a finite number of 
spatial coordinates. 
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A common assumption in (1. 1) — (1.3) is that both the coefficients of and the input 
to these equations are deterministic and perfectly known. In this classical formula- 
tion, K(x ) and Q(x) in (1.1); c, er, (“noise”) (f), and initial state X (0) in (1.2); and 
E (x), W (x , t), and the associated initial and boundary conditions in (1.3) are per- 
fectly known, deterministic functions, so that the state of these systems is described 
by solutions of deterministic equations. The theory of DEs is well established and is 
discussed in numerous books [8-13]. 

As previously stated, the classical formulation using DEs to describe problems 
can be unsatisfactory since (i) the coefficients, the input, and/or the end condi- 
tions of these equations can be uncertain due to incomplete information, limited 
understanding of some phenomena, complex relationships between microscopic and 
macroscopic properties of materials, and inherent randomness and (ii) DEs deliver 
a single solution for a system state, rather than a range of likely solutions needed 
to characterize the performance of systems encountered in applications, that is, sys- 
tems with uncertain properties. For example, suppose stiffness K(x) = K and action 
Q(x ) = Q in (1.1) are space invariant, so that U (I) = QI 4 /(8K). If K and Q are 
uncertain parameters described by independent, uniformly distributed random vari- 
ables in the ranges (k \ , ki), 0 < k\ < kj and (<71, <72), 0 < q\ < qi, respectively, 
the first two moments of the tip displacement U (I) are 


l 4 q 1 + qi ln(k 2 /ki) 
8 2 kj — k\ 


E{U (/)] = 



The tip displacement corresponding to the average values, q = {q\ + qi) /2 and 
k = (k 1 + hi) II, of Q and K, that is, the solution of a DE with (Q, K) set equal to 
( q,k ) is «(/) = ql 4 /(8k). For (qi,qi) = (1,2) and (k\,ki) = (0.5, 1), we have 
£[£/(/)] = 2.0794(7 4 /8), Std [£/(/)] = (E[U(l) 2 ]- E[U(l)] 2 ) l/2 = 0.5853(/ 4 /8), 
and u(l) = 2(/ 4 /8). File solution of DE approximates satisfactorily the mean E[U (/)] 
of U (/), but provides no information on the likely range of values of the tip dis- 
placement. However, even the mean of U (/) may be missed by DEs. For example, 
E[U (/)] = 3.8376(/ 4 /8) and u(I) = 2.7273(/ 4 /8) for Q as previously and K uni- 
formly distributed in (0.1,1). 

Following are the stochastic counterparts of the deterministic equations in 
(1.1) — (1.5). The equation (1.1) becomes a stochastic ordinary differential equation 
if action Q(x) and/or stiffness K(x) are random functions, and so does (1.2) if (c, o) 
are random variables and/or (“noise”) (t) is a stochastic process; (1.3) becomes a 
stochastic partial differential equation if permeability K(x ), flux W(x,t), and/or initial 
and boundary conditions are random functions; (1.4) becomes a stochastic algebraic 
equation if it corresponds to the stochastic version of (1.1); and (1.5) becomes a 
stochastic ordinary differential equation if the permeability E ( x ) and the flux W(x,t) 
are random functions. 

The white noise input model considered almost exclusively in mathematical 
studies precludes the use of classical calculus since some of the integrals in the 
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Fig. 1.1 Samples of stock prices for (c, a) = (0.5, 0.1) {left pane!) and ( c,a ) = (0.5,1) 
{right panel ) 


expression of the solution are not defined in the Riemann-Stieltjes sense. For exam- 
ple, the noise in ( 1 .2) is usually viewed as the formal derivative of a standard Brownian 
motion B(t), t > 0, a process with independent Gaussian increments, mean 0, vari- 
ance E[B{t) 2 ] = t, and continuous, non-differentiable samples. The corresponding 
differential and integral forms of (1.2) with “(noise) (t)” replaced with the formal 
derivative dB(t)/dt of a Brownian motion are 

dX(t) = cX(t)dt + a X(t)dB(t), t > 0, and 

X(s)ds + o f X(s)dB(s), t > 0, (1.7) 

Jo 

for X (0) = X() . The integral fJ X(s)dB(s) in (1.7) is an Ito rather than Riemann- 
Stieltjes integral. The solution of the Ito stochastic equations in (1.7) is 

X(t) — xo exp [(c — o 2 /2)t + oB(t)\, (1.8) 

and can be obtained by Ito’s calculus ([14], Example 1.3.8). The plots in the left 
and the right panels of Fig. 1.1 show samples of X(t) for (c, er) = (0.5, 0.1) and 
(c, a) = (0.5, 1). The significant difference between samples of X(t) corresponding 
to the same mean return rate c and noise B{t), but different volatilities, shows that the 
use of a single (deterministic) value for a is unrealistic if cr is uncertain since X(t) is 
sensitive to the particular value used for this parameter. 

The complexity of most SEs encountered in applications has prevented the devel- 
opment of general and rigorous methods for solving these equations. Most meth- 
ods for solving SEs in the applied literature are based on heuristic arguments and 
approximations whose quality may be difficult to assess. Rigorous methods from the 
mathematical literature can be used for the class of SEs that can be recast in standard 
format, that is, stochastic equations with deterministic coefficients driven by white 
noise. For example, if the driving noise in (1.2) is a colored noise Z(t) defined by 
dZ(t) = —pZ{t)dt + y/2pdB{t), p > 0, t > 0, then the state ( X,Z) satisfies the 
differential equation 


X (t) = xq + c 
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dX(t) = cX(t)dt + aX(l)Z(t)dt 
dZ(t) = —pZ(t)dt + y/2pdB(t), 
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(1.9) 


that is, an ordinary stochastic differential equation with deterministic coefficients 
driven by white noise. Similar arguments hold for discrete versions of stochastic 
equations of the type in (1.5) if C{t) can be represented as the output of a filter to 
white noise. It is also possible in some cases to convert SEs with random coefficients 
into SEs with deterministic coefficients. For example, if the volatility a in (1.7) is 
uncertain and modeled by a random variable Y, it can be viewed as the solution of 
d(t) = 0 with initial condition cr(0) = Y, so that the bivariate process ( X(t ), er(f)) 
satisfies an ordinary differential equation driven by white noise. Other examples in 
which this conversion is possible can be found in [15]. 


1.2 Organization 

The book is largely self-contained with nine chapters and two appendices providing 
a summary of useful facts. Chapters 2 and 3 review essential concepts on probability 
theory and random functions, present Monte Carlo algorithms for generating samples 
of random variables and functions, and construct estimators for properties of random 
elements. Chapter 4 illustrates the need for alternatives to the Riemann-Stieltjes inte- 
gral, defines the Ito integral, and establishes properties of these integrals. Chapter 5 
establishes Ito’s formula for continuous and arbitrary semimartingales, defines sto- 
chastic differential equations, and gives conditions for the existence and uniqueness 
of the solutions for these equations. The Ito formula is applied to develop moment 
equations for the state of dynamic systems driven by random noise, solve locally a 
class of deterministic partial differential equations, and establish Girsanov’s formula. 

The focus of the remaining chapters is on methods for solving stochastic alge- 
braic and differential equations. Chapter 6 examines methods for constructing prob- 
abilistic models for the random entries of stochastic equations. Models for random 
variables and functions are used to describe random microstructures. Linear models 
with random coefficients used almost exclusively to characterize the coefficients of 
elliptic stochastic partial differential equations are also examined. Chapter 7 consid- 
ers ordinary differential equations with random input and deterministic and random 
coefficients. Ito’s formula is applied to develop Fokker-Planck and other equations 
for the state of dynamic deterministic systems driven by random noise, that are 
commonly used in the random vibration theory. Equations with random coefficients 
and input are solved by Monte Carlo simulation, conditional analysis, stochastic 
reduced order models, stochastic Galerkin, and stochastic collocation. Numerical 
examples are presented to illustrate the implementation of these methods and assess 
their performance. The special case of equations with coefficients of small uncer- 
tainty are solved by Taylor, Neumann, and perturbation series. Chapter 8 deals with 
stochastic algebraic equations, and solves these equations by Monte Carlo, stochas- 
tic reduced order models, stochastic Galerkin, stochastic collocation, and reliability 
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methods. Numerical examples provide insight on the relative accuracy and efficiency 
of these methods. Taylor, Neumann, and perturbation series are applied to solve 
stochastic algebraic equations with random entries of small uncertainty. Chapter 9 
considers stochastic partial differential equations. It is first noted that the methods 
in Chaps. 7-8 can be used to solve discrete versions of these equations. Then, an 
equation of the type considered in the mathematical literature is defined and solved. 
Following these preliminary considerations, equations commonly encountered in 
applications are defined and solved by Monte Carlo, stochastic reduced order mod- 
els, stochastic Galerkin, and stochastic collocation. Numerical examples illustrate 
the application of these methods and their relative performance. The special case 
of these equations with random entries of small uncertainty are solved by Taylor, 
Neumann, and perturbation series. The presentation in Chaps. 7-9 uses concepts 
introduced in Chaps. 2-5 and Appendices A-B and probabilistic models constructed 
in Chap. 6. 


1.3 Classroom Use 

This book can be used as text for three one-semester graduate courses emphasizing 
various topics from the book. Following are potential titles and contents of these 
courses. 

• Formulation of stochastic equations. 

The course may review essentials of probability theory and random functions 
(Chaps. 2 and 3), discuss the construction of probabilistic models for both the 
coefficients of and the input to stochastic equations (Chap. 6), Monte Carlo algo- 
rithms for generating samples of random elements considered in Chaps. 2 and 3, 
and Monte Carlo solutions of stochastic equations discussed in Chaps. 7-9. 

• Random vibration by Ito’s calculus. 

The course may largely follow developments in the first part of Chap. 7 dealing with 
linear and nonlinear dynamic systems subjected to random noise. The derivations 
in this chapter use concepts in Chaps. 4 and 5 on stochastic integrals, Ito’s formula, 
and stochastic differential equations. 

• Stochastic partial differential equations. 

The study of stochastic elliptic partial differential equations and applications to 
random heterogeneous materials in Chap. 9 provide ample material for this course. 
The course can be extended to a two semester course by including the class of 
stochastic equations examined in Chaps. 7 and 8 and facts from Appendix B. 
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Chapter 2 

Essentials of Probability Theory 


2.1 Introduction 

Many properties of physical systems and/or actions of these systems are uncertain, 
so that the behavior of these systems cannot be forecasted in a precise deterministic 
manner; it can only be described probabilistically. For example, the weather per- 
son tells us the chance of rain tomorrow. Engineers calculate the likelihood that a 
particular mechanical system will perform according to specified standards. 

Suppose a relevant output of a physical system depends on a finite number 
of uncertain parameters. For weather forecasting, these parameters relate to the 
current meteorologic conditions and atmospheric processes. For aircraft design, these 
parameters include material properties, state of electronic components, and flight 
patterns. Our objective is to calculate the probability that an output of interest has 
specified features, such as the chance of rain tomorrow for the weather person or 
the adequate aircraft performance for the aircraft engineer. This chapter provides the 
framework for calculating these types of probabilities. 

2.2 Probability Space 

A probability space (£2, & , P) consists of a sample space £2 , a a-field & , and a 
probability measure P. Sample spaces, cr -fields, and probability measures are defined 
and illustrated by examples. 


2.2.1 Sample Space 


Definition 2.1 A set £ 2 collecting the outcomes of a particular experiment is called 
sample space. 
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For example, 12 = {head, tail}, 12 = {1, 2, 3, 4, 5, 6}, and 12 = [a, b] C [0, oo) 
are sample spaces for the experiments of tossing a coin, rolling a dice, and measuring 
daily rainfall amounts in Ithaca, NY. The first two sample spaces and the last sample 
space have, respectively, a finite and an uncountable number of elements. Sample 
spaces can be finite, countable, or uncountable. 

2.2.2 a-Field 

Consider a game in which one wins $10 and looses $5 for outcomes of a die rolling 
experiment in {1, 2} and {3, 4, 5, 6}, respectively. The particular outcome {&>} is not 
relevant. The relevant information is contained in si = {{1, 2}, {3, 4, 5, 6}} since 
one wins $10 if to e {1, 2} and looses $5 if a> e {3, 4, 5, 6}. a-fields are subsets of 
12 that are relevant for a particular objective. The a -field for the game considered 
here includes srf . 

Definition 2.2 A collection & of subsets of 12 is said to be a a -field on a sample 
space 12 if (1) 0 e J, (2) A e J implies A c e & , and (3) A,- e i e I, implies 
U, e / A, e & , where I is a countable set. The members of 3? are called events, or 
^"-measurable subsets of 12, or just measurable subsets of 12. The pair (12, .3') is 
said to be a measurable space. 

The first and the third conditions in the previous definition can be replaced with 
12 e and n,- € / A? e & by using condition (2) and De Morgan’s formula. Also note 
that the last two conditions in the definition of & are consistent with our intuition, 
which suggests that A c is observed if A is not and that U ;e / A,- is observed if a subset 
of {A,-, i e /} is. 

Example 2.1 The a-field & associated with the game in which one wins $10 
and looses $5 for outcomes of a die rolling experiment in {1, 2} and {3, 4, 5, 6} 
is & = {0, {1,2}, {3, 4, 5, 6}, 12}, where 12 = {1, 2, 3, 4, 5, 6}. If the game is 
modified such that one wins $10 if u> e {1,2} and looses $3, $4, $5, and $6 for 
outcomes co = 3,4, 5, and 6, respectively, is too coarse to capture all relevant 
events; it needs to be refined to include the events {1, 2}, {3}, {4}, {5}, and {6}, 
and complements and unions of these events. O 

Example 2.2 Atoms are the finest members of a a-field & , that is, A e & is an 
atom of & if any event B <= included in A is either 0 or A. The sets {1,2} and 
{3, 4, 5, 6} are atoms of the first a -field and the sets {1,2}, {3}, {4}, {5}, {6} are 
atoms of the second cr-field (Example 2.1). O 

Definition 2.3 The a-field generated by a collection of subsets of 12 is 

o(s/) = ^ , where {(^} are a-fields on 12. (2.1) 

There is no a-field smaller than a (<// ) that includes ,e/. 
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Example 2.3 Let S be a metric space and let 37 be the topology induced on S by 
its metric (Sect. B. 1.1). The Borel <r -field on S is the cr -field 7 = o (.7) generated 
by 7. The members of a (77) are called Borel sets. The Borel a-fields on S = 
M rf , d > 1, and S = JR are generated by the intervals in these spaces and are 
denoted by 7(R d ) = 7 d and 7(R) = 7 [ = 38, respectively. The Borel sets on 
K can be generated by open, semi-open, finite, semi-finite, and other intervals of the 
real line ([11], Sect. 1.7). O 


2.2.3 Probability Measure 

We define measures and probability measures, review properties of probability mea- 
sures useful for applications, and use conditional probabilities to introduce the law 
of total probability and Bayes theorem. 

Definition 2.4 Let (32, .7) be a measurable space. A set function p : . 7 [0, oo] 
such that (1) p(0) = 0 and (2) /r( U m(A ; i ) f° r mutually disjoints sets 
A n e 37 is called a measure on (32, 37). If (32, 7) = (K^, 7 d ), then p is said to 
be a Borel measure. The triple (32, 7, p) is called a measure space. 

Definition 2.5 If p has finite total mass, that is, p(32) < oo, the measure p is said 
to be finite. In this case, it can be normalized to take values in [0, 1]. Any measure 
with unit total mass is called a probability measure and is denoted by P. The triple 
(32, 7, P ) is called a probability space. 

Definition 2.6 A measure p on a measurable space (32, 7) is said to be <r -finite 
if there exists a countable, pairwise disjoint sequence of measurable sets {A„}, that 
is, A„ e 7, n = 1,2,..., such that p(A n ) < oo, n — 1,2,..., and p(A) = 
U^li p(A fl A n ) for every A e 7 ([5], Proposition 13). 

The Lebesgue measure X on (R, 7> ) is cr -finite since there exists a pairwise disjoint 
sequence A n = [n,n + 1), n e Z, such that X(A„) < oo and 

k(A) = U~ j k ( A n A„ ) for every A e 7. Similar arguments show that the Lebesgue 
measure on (R rf , 7(M- d )) is c-finite. However, the Lebesgue measure is not finite. 

Definition 2.7 Let (32, 7, P) be a probability space. A set N e 7 such that 
P(N) = 0 is called a null set. A property valid on 32 \ N is said to hold almost 
everywhere (a.e.), almost surely (a.s.), for almost every o>, or with probability one 
(w.p.l.). 

Definition 2.8 A probability space (32 , 7 , P ) is complete if A C B such that 
B e 7 and P(B) = 0, then A e 7, which implies P(A) = 0, by the second 
condition in Definition 2.2. 

It is assumed throughout the book that the probability spaces are complete. The 
assumption is not restrictive since for any probability space (32, 7, P) there exists 
a complete space (32, 7, P) such that 7 c 7 and P = P on 7 ([4], Theorem 
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2.2.5). The completion of measures is rather simple ([1], p. 4). For example, set 
JV = {P c Q : 3N e & with P(N) = 0 and B c N}, ~¥ = {A U B : A e 
B e JT}, and P(A U B) = P(A), where A e & and B e jV . Then P is a 
probability measure on (A2, ■'¥) and & is a er -field. 

The following properties of the probability measure P are useful for calculations, 
and result directly from its definition. 

P(A) = 1 - P(A C ), for Ae^, 

P(A) < P(P), for ACB, A, Be &, 

oo 

P(U~ 1 Aj) < ^ P(A,), for Ai e &, 

i = 1 

P(A U P) = P(A ) + P(P) - P(A n B), for A, P e&, and 

n 

P(B) = P(P fl Aj), for A(, B e fP and A\, . . . , A n a partition of C2. 
i=i 

(2.2) 

The probability measure also satisfies the inclusion-exclusion formula 

n n i — 1 

**( u? =1 A 0 = Z p ^ - ZZ p ^ n 

i=l 1=2 y=l 

n i-l 7-1 

+ ZZZ p(Ai n a ; - n At) + (- D" +1 p( n n q=l A q ). 


Theorem 2.1 If {C2 , .jF, P) i.v a probability space and A„ e & , n = 1,2, , are 
events on this space, then 


oo 

P(U~J An) <Z^(A„). (2.4) 

11=1 


Proof Set Pj = Ai and P„ = A„ \ ( U",^ 1 A,), n > 1. The sets {P„} are disjoint 
events with the properties P„ C A„, andLf^P,, = U^.jA„. Hence, P(lP^_jA„) = 

p( u« x b„) = z“i ^(fin) < zr=i ^»)- a 

Definition 2.9 Let (A2, .jP, P) be a probability space and P e an event such that 
P(P) > 0. The probability of Ae# conditional on P is 


P(A | P) = 


P(A fi P) 
P(B) 


Ae&. 


(2.5) 


The definition is meaningful in the sense that P(- | P) is a probability measure. 

Suppose an experiment is performed in which P either occurs or does not occur, 
and that P(P), P(P C ) > 0. If P or B c is observed, the conditional probability of A 
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is P(A | B) or P(A \ B c ), respectively. Hence, the conditional probability of A is 
equal to P(AC\B)/ P(B) and P(Afl B c )/ P(B C ) with probabilities P(B) and P(B C ). 
This remark will be revisited later in this chapter (Example 2.56). 

The following two formulas involving conditional probabilities are very useful 
for applications. Let (T2 , fp , P) be a probability space. A, e & , i = 1, a 
partition of Q , that is, Q — U " =1 A,- and A,- fl A j = 0 for i ^ j. If P( A , ) > 0, and 
then 


n n 

P(B) = P(B (T A,-) = P(B | Aj)P(Aj) (Law of total probability) 
i=i 1=1 


P (Ay | B) = 


P(Aj)P(B 
P(B ) 


if) 


P(Aj)P(B | Ay) 
Z/=i P(Ai)P(B I A/) 


(Bayes’ formula). 

(2.6) 


In the Bayesian framework, the probabilities P(A,) and P(A ; - | B) are referred to 
as the prior and the posterior probabilities of A j . 

Example 2.4 Let B denote the event that a system performs satisfactorily, and 
let {A,-, i = l,2} be events partitioning T2 . Suppose P(Ai) = 0.8, P(A 2 ) = 0.2, 
P(B | Ai) = 0.9, and P(B \ A 2 ) = 0.7. The probability that the system performs 
satisfactorily is P(B ) = (0.9)(0.8) + (0.7) (0.2) = 0.86. O 


Theorem 2.2 The inequalities 


P( U"_j A,) < P u = P(A, ) — max P(Aj fl A,) and 


P( U - =1 A,-) > Pi = P(A\) + 2>ax ( 0, P(Aj) - £ P(A } n A,) ) , (2.7) 


hold for Aj e arbitrary. 

Proof Set Bi = Af i = l, . . . , n, and G — fl "Z/P;, and note that P( fl" =1 P, j = 
P(G fl 8„) = P(G) — P(G fl A„). Repeated application of this formula gives 
p( n ” =1 Bi) = P(fii) - ZL2 p (B 1 n • • • n b*_i n a*) so that 

n 

p( U-=1 A;) = 1 - p( n ” =1 B f ) = P(A 0 + P(Pi n ■ • • n b*_ i n a*) 

k=2 

n 

= P(A\) + Y P(Pi n • • ■ n p A _! I A k )P(A k ). 

k=2 


Since P(B 1 n-..nB it _i | A*) < P(B; | A*) = 1-P(A; | A*), 7 = 1,...,*-1, 
and P ( Pi fl • • ■ fl B k - \ \ A k ) = l-P^U- • -UA a _i | A k ) > l-^l P(Aj I A*) 
hold, we have the bounds in (2.7). A 
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The bounds in (2.7) are relatively simple to calculate since they involve the prob- 
abilities of the events A; and A, IT A,-, rather then probabilities of intersections of 
multiple events as in (2.3). Note that P(U" =1 A, ) can be interpreted as the probabil- 
ity of failure for a physical system with failure modes A\, . . . , A n occurring with 
probabilities P(A,), i = 


2.2.4 Construction of Probability Spaces 

In most applications a random experiment rather than a probability space is specified, 
so that we need to construct a probability space based on the available information. 
We present three methods for constructing probability spaces and illustrate them by 
examples. 

2.2.4.1 Countable Sample Space 

Let £2 = {&>i, a) 2 , . . .} be a countable sample space and let { p, > 0} such that 
pi — 1. Take & to be the collection of all subsets of £2, that is, the power set 
of £2 . The set function P: 5P — »■ [0, 1] defined by 

P(A) = ^ pi, AeJ?, (2.8) 

oi,eA 

is a probability measure on (£2, ■ < P) since it is countably additive and has the prop- 
erties P(0) = 0 and P(£2) = 1. 

Example 2.5 Suppose service life has been recorded for n nominally identical 
devices. Let 0 = to < ri < • • • < r m _i, let [0, ti), [ti, ti), ■ ■ . , [r m _i, oo) be a 
partition of [0, oo), and denoted by n, < n the number of devices with measured 
service life in the range [r,_i , t,), i = 1 , ... ,m, with the notation r m = oo. 
The members of the probability space (.12, & , P) describing this experiment are 
£2 = {&>i, ... & = the power set of £2 , and P ({&>;}) ~ n; /«. <> 

2.2.4.2 Product Probability Space 

Let (£2k, &k, Pk), k = I ■ 2, be probability spaces. The elements of the product 
probability space (£2, , P ) are 

£2 = £2 1 x T ?2 = {(&>i, C 02 ) ■ u>k G £2k, k = 1,2} 

& = x = a(0&), where £% = {A\ x A 2 : A\ e &\ , A 2 e J^} 

P = P\ x Pi, where P(A[ x A 2 ) = Pi (A j)P(A 2 ), for Aj e and A 2 e 

(2.9) 

and are called the product sample space, a -field, and probability measure. 

The product sample space £2 contains the joint outcomes of, for example, two 
experiments. The er -field & can also be obtained from J 5 " = a (3#) — o ( ( /j\ , £f 2 ) 
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where = {Ai x Q 2 '■ Ai e } and ^2 = {^1 x A 2 : A 2 e ,^ 2 } since Sfi and ^2 
are a -fields on Q that are included in & and every member of is the intersection 
of sets from and The construction of the product probability measure is less 
simple. It can be shown that there exists a unique probability P on (Q . ■'¥) that 
satisfies (2.9) ([4], Theorem 3.3.5). 

The formulas in (2.9) can be generalized to define product probability spaces of 
n > 3 probability spaces. The definition extends directly to the case in which n 
is infinity. If the probability spaces {Qk, I \ ) are identical, the product sample 

space, a-field, and probability are denoted by Q" , J^ - ", and P" . 

Example 2.6 Let (Qk, &k, Pk), k = 1 , 2 , be probability spaces associated with 
the experiment of rolling two dice, so that Qk = {1, 2, 3, 4, 5, 6 }, = all 

subsets of Qk- and Pk({i\) = 1/6, i = 1,...,6. The product sample space 
Q = Q\ xft = {a> = (i. j), i, j = 1, . . . , 6 } includes all outcomes of the experi- 
ment. The product a-field & consists of all subsets of Q since the members of M are 
O'. 7 ). Uj-g/jO, j), U jel2 (i,j), U ie / lje / 2 (i, 7 ), where /i ,/ 2 c {1,2, 3,4, 5,6}. 
The product probability measure is P {{(»)) = 1/36, co e Q, since the outcomes of 
the experiment are equally likely. <> 

Example 2. 7 Consider a facility that fails under wind speeds exceeding a critical 
value v C r- Let p — P(A), where A — {V > v cr } and V > 0 denotes the maxi- 
mum yearly wind speed at the facility site. The members of the probability space 
(Q, P) describing this problem are Q = {A, A c }, ■ < F = all parts of Q , and 
P (A) = p. To evaluate the probability that the facility performs satisfactorily during 
its design life of n years, we need to construct a new probability space corresponding 
to an “experiment” consisting of n independent repetitions of the yearly experiment. 
The construction resembles the experiment of tossing a loaded coin n times with 
sides {1} and {0} corresponding to events A and A c ([11], p. 41). 

The members of the product probability space corresponding to a n year time 
horizon are Q" — {&> = (u>\, . . . , a> n ) : co, = 0 or 1 }, & n — all subsets of Q" , and 
P n (B) — Xwefi p n ‘°q’‘~ n ‘°, where B € q — 1 — p, and n (t) = X/Li a) i gi yes 
the numbers of l’s in to = (a> 1 , ...,<«„).<> 


2.2.4.3 Extension of Probability Measure 

Let Q be a sample space associated with an experiment and let ^ be a collection 
of subsets of Q . If ^ is a field on Q and R is a real-valued, positive, and countably 
additive function defined on *€ such that R(Q) = 1, then there exists a unique 
probability P on & = rs( r {,') such that P(A) = R(A) for each A e ( to, that is, the 
restriction of P to ^ is equal to R ([5], Theorem 14, p. 94). A field is a collection of 
sets satisfying the conditions of Definition 2.2 with I finite. 

Example 2.8 Let Q = R. and let f <«f consist of the empty set and the collection 
of all finite unions of intervals of the type (a, b] for a < b. (— 00 , a], (a, 00 ), 
and (— 00 , 00 ). Let F\R — > [0, 1] be a continuous increasing function such that 
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lirrpt^-oo F(x) = 0 and lim*-^ F(x) = 1, and define R — >■ [0,1] by 
R((a,b]) = F(b) — F(a), R{(— oo, a]) = F(a), R((a,o o)) = 1 — F(a), and 
R{(—oo, oo)) = 1. For example, R((a, oo)) may represent the probability that the 
strength of a particular material exceeds a, so that it can be estimated from experi- 
ments. Since ^ is a field, the set function R can be extended uniquely to a probability 
measure on (R, SB — a (S' ) ) ([5], Proposition 9, p. 90, and Theorem 14, p. 94). O 


2.3 Measurable Functions and Random Elements 

Consider two measurable spaces (Q . S) and (tp, S) and a mapping hr. 12 — > F. 

Definition 2.10 The mapping lv.S2 — > <F is said to be measurable from (Q , S') to 
(F, (S) or (S , ( S ) -measurable if C S , that is, h~ l (B) — [m e 12 : him) e 

B} e S for all B e S . This property of h is denoted by h e S' /Sf or just h e S if 
there is no confusion about S . 

Example 2.9 Let A e S , (F, Sf) — (R, SB), and 1^ the indicator function of A, 
that is, 1 a(«) = 1 for m e A and 1 a(<S) = 0 for m f. A. The function 1^ : 12 —> M 
is SIS - measurable since l]^ 1 ({0}) = A c and l^ 1 ({ 1 }) = A, so that 1 ~ A 1 (SB) = 
[0, £2, A, A c ] c S. O 

Example 2.10 Let {A, , }, i = 1, .... n, be events partitioning Q and { c, } , i = 
1, real constants. The function/? = X!/=i c i 1 A, is (S' , S?) -measurable, where 

— { c i , . . . , c„ | and S is the power set of F. The image of the members of F in 
12 can be obtained simply, for example, h ~ 1 ({c; , Cj\) = A; U Aj e S. O 

Theorem 2.3 If h\ R d —*■ RA is continuous, it is also (SB d , SB q )-measurable. 

Proof Function li is measurable if and only if h~ l (S q ) C SB d ([11], Proposition 
3.2.1), where S' denotes the topology generated by the open balls in RA . Since 
h is continuous, we have If 1 (S q ) C S d C SB d (Sect. B. 1.1), where the latter 
inequality holds since SB d is generated by S d . A 

Definition 2.11 Let (12, S , P) be a probability space and (S, S') a measurable 
space, where S is a metric space and S — a(S) denotes the er-field generated by 
the topology S induced on S by its metric (Example 2.3 and Sect. B. 1). The mapping 
X : S2 — > S is a random element if it is (S , .^(-measurable. 

The element X in the above definition is a real-valued random variable, complex- 
valued random variable, random vector or an R^-valued random variable, (C^-valued 
random variable, real-valued stochastic process with continuous samples defined on 
[0, 1 ] , or a real- valued random field with continuous samples defined on a subset D of 
R d ’ if S = R, S = C, S = R rf , 5 = C d , S = C[0, 1], orS = C(D), respectively. 
The image X (m) , m e 12, of X is a number, vector in R rf , vector in C d , real-valued 
continuous function defined on [0,1], or a real-valued continuous function defined 
on D if 5 = R, S = R d , S = C d , S = C[0, 1], or S = C(D), respectively. For 
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example, X (at) = (Xi ( a > ) , . . . , Xd (&>)) e M. d is a d-dimensional vector for S = M. d 
and X(co)(t), t e [0, 1], is a sample of a real-valued stochastic process defined on 
[0,1] for S = C[0, 1], It is common to denote X{a>)(t) by X(t , w), as will be seen 
in a subsequent section. For a fixed time 1 e [0, 1], X(t , a > ) is a random variable. 

Definition 2.12 Let (02, y, P) be a probability space, (.S', .(/') a measurable space, 
and X : E2 — »■ S an (j£\ .(/'[-measurable mapping. The probability measure induced 
by X or the distribution of X is the probability measure 

Q(B) = P(X~ l (B)) = PoX~ l (B), Bey, (2.10) 

on the measurable space ( S , .R). 

Example 2.11 If S = R and B = (— oo, x ], rel, then Q(B) = P{{co : X(co) < 
x) = P(X < x) is called the probability distribution function or just the distribution 
of random variable X, and is typically denoted by F(r) = P(X < x). If S = 
R d and B = x^_j(— oo, jc,-], Xj e M, then Q(B) = P({co:Xi(a > ) < x; , i 

I (/ i = P(n d _ l { X i < xi}) is called the joint distribution of random vector 

X. If S = C[0, 1], {l, c [0, 1], and B = xjj =1 (— oo, x,], x,- e K, then 

Q(B) = P({a>\X(tk, co) < x\, k = 1, . . . , n) = P (P§ =i {X(tk) < Xk}) is called the 
finite dimensional distribution of order n of X. The finite dimensional distributions of 
a stochastic process Xare the joint distributions of random vectors (X(?i), . . . , X (t n )) 
consisting of values of the process at times t \ , . . . , t n . O 

Example 2.12 Let (f2, y , P) be a probability space, A e y , S — {0. 1], and 
1 a : — > S an indicator function. The function 1 a is (y, ^[-measurable, where 

y = {0, S, {0}, {1}}. The probability measure Q induced by 1 a on (.S’, ,9') is 
2({0}) = Fd-'aO})) = P(A e ) and 0({1}) = />(1^({1})) = P(A). The dis- 
tribution of random variable 1 a is F(x) = P(A c )l(x > 0) + P(A)\ (x > 1). O 

Example 2.13 Let (f2,y, P) be a probability space and let X : £2 — > R be (.(/, 3S)- 
measurable. Suppose the probability measure Q induced by P on (R, SS) has the 
expression Q((— oo,x]) = P(X~ x ((— oo, x])) = <P((x — pP)/o), where x, p,, and 
a > 0 are reals, 0(u) = <p{£)d%, and (/>(§) = exp(— ^ 2 /2)/V2tt. Then X is 

said to be Gaussian random variable with mean p , standard deviation a, and variance 
a 2 , a property denoted by X ~ N(p, a 2 ). O 

Theorem 2.4 Let (Q , y, P ) be a probability space. A function X\Q —> \9 d is an 
W l -valued random variable on (fH , y, P) if its coordinates are real-valued random 
variables on this probability space. 

Proof If X is a random vector, its coordinates X, = tt, o X , i = 1 , . . . , d , are random 
variables because the projection map : r, (x) = x; is continuous. 

Suppose now that X , are random variables. Let Sf be the collection of open 
rectangles in with members R = 1\ x • • • x Id , where /, are open intervals 
in M. The a-field SS d is generated by these rectangles, that is, SS d = a(0f) ([11], 
Proposition 3.2.4), so that it is sufficient to show that X~ ] (M) is in y . We have 


18 


2 Essentials of Probability Theory 


X l (R) = nf = | X t 1 (Ij) e X since X, are random variables. Hence, X is a random 
vector if and only if {a> e £2 : Xj ( co ) < Xj } e X for all Xj e R, i = 1 , ,d. A 

Example 2.14 Consider a series X = (X \ , AS, . . .), where X, are measurable 
functions from (T2 , X) to (0, Xj). Let .'/X be the collection of all subsets of 
Z + = {1,2,...}. The function (m,co) i->- X m (co) depending on the arguments 
m and co is measurable from (Z + x Q . YXf x X) to (0. XI). Generally, this property 
does not hold if the discrete index m in this example is allowed to take values in an 
uncountable set. O 

Proof Let A — { (m , co ) : X m (co) e B}be the inverse image of the function (m , co) i-»- 
X m (co) in Z + x Q corresponding to an arbitrary member B of Sf. Because X m is 
measurable, the set A can be expressed as the countable union U m {co:X m (a>) e B } 
of sets \o) : X m (co) e B) that are in X for each m > 1. Hence, A is in -XT x -X . 
Note also that the function m i— >• X m (co) is -Xf -measurable for a fixed co e Q since 
[in : X m (co) e B) is a subset of Z + so that it is in Jff . A 

Definition 2.13 Let (f2, ■X) and (.S', ■'/) be measurable spaces and let : Li? — > S 
be a random element, that is, an (& , ^{-measurable mapping. The cr -field generated 
by X, denoted by a (X) or , is<r(X) = X~ l (Y) = {X _I (B) : WB e SP\. 

Example 2.15 Let (E2 , ,X , P) be a probability space and I \ Q —> R, A e -X , an 
indicator function. The <7 -field generated by I t is a ( 1 ) = {0, A, A c }. There is 
no smaller field with respect to which 1 a is measurable. <> 

It is common in applications to be interested in functions of random variables. 
For example, we may have to find the distribution of the deformation or any other 
response Y of a physical system with random properties X \ that is subjected to a 
random action X 2 . Suppose X \ and Xi are random variables on a probability space 
(P2 , IX , P). To achieve this objective, we first need to show that T, which is a function 
of (X \ , X 2 ), is a random variable on (Q . .X , P). The following theorem shows that 
Y has this property if the mapping (X \ . A" 2 ) 1 — s- K i s measurable. 

Theorem 2.5 Let h : (L2, X) — > (P, Sf) and g : (P, Sf) — > (X, YXX) be measur- 
able functions, where (L2 , X), (X, Sf), (0, JXf) are measurable spaces. Then the 
function g o h : (T2 , X) — > (0, XX) is measurable. 

Proof We have g~ 1 (2%?) C Sf and h~ l (g _1 (J^)) C h~ l (^f) C X since g and h 
are measurable functions. A 


2.4 Independence 

We define independent a -fields and show that this concept applies directly to char- 
acterize independent events and random elements. 

Definition 2.14 Let (L2 , X , P) be a probability space and Xi, i e I, a collection 
of sub-CT-fields of X. If I is finite, the cr -fields X { , i e /, are independent if 
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p(n /e/ A,) = n^i). e (2-1 1) 

iel 

If /is infinite and (2.1 1) holds for all finite subsets of /, then the er-fields J i e I, 
are said to be independent. 

The definition implies that (2.11) must hold for any subset of {A,- e J i e 1} 
since some A, can be selected to coincide with 12. If the er-fields .22 , i e I, are on 
different sample spaces, the above independence condition needs to be applied on 
the corresponding product probability space. 

Definition 2.15 A collection of events A,- e fp , i e /, is said to be independent if 
the cr -fields er(A;) = {0, 12, A/, A/} are independent. 

Example 2.16 Two events A and Bona probability space (12, .2/ P) are independent 
if P(A n B) = P(A)P(B). This is the classical definition for the independence 
between two events. O 

Proof By definition, A and B are independent if the fields a (A) and a(B ) are 
independent, which implies P(A' n B') = P(A')P(B') for all A' e cr(A) and 
B' e cr(B). The stated condition of independence follows by considering all possi- 
ble pairs of events (A' , B'). The converse also holds, that is, P(AflB) = P ( A) P( B) 
implies the independence of the fields cr(A) and a(B). For example, P(A C fl B) = 
P(B) - P(A n B) = P(B) - P(A)P(B) = P(A C )P(B). 

Note also that, the classical definition of independence between two events follows 
from that of the conditional probability. IfA and/? are independent and P(B) > 0, the 
conditional probability P (A | B) — P ( AD B) / P(B) is unaffected by the occurrence 
of B so that P(A \ B) = P(A) implying P(AC\ B) = P(A)P(B). A 

Example 2.17 Consider two events A and B as in Example 2.16. If A fl B — 0 and 
P(A), P(B ) > 0, then A and B are not independent since P(A fl B) = P(0) = 0 
and P(A), P(B) > 0. O 

Definition 2.16 The events A/, i = 1, . . . , n, on a probability space (12. .2^, P) 
are independent if 


P(A h n a ; - 2 n ■ • • n a,j = P( A k ) (2.12) 

k= 1 


holds for any subset {/ 1 , . . . , i m ) of {1, . . . , n}. 

Example 2.18 The requirement in (2.12) is essential for three or more events. It is 
not sufficient to satisfy (2.12) for the entire collection of events. For example, set 
P2 = { 1 , 2 , 3, 4}, & = all parts of / 2 , and P({ 1 }) = V 2/2 - 1/4, P({ 2 }) = 
1/4, P({3}) = 3/4 - a/2/2, and P({4}) = 1/4. Let A, = {1,3}, A 2 = {2, 3}, 
and A 3 = {3, 4} be events in (/2, j^ - ). The probability of Ai fl Ai fl A 3 = {3} is 
P({3}) = 3/4 — V2/2 and is equal to P (A\) P (Af) P (Af) . However, P(Aj nA 2 ) 3 ^ 
P(Ai)P(A 2 ) ([9], p. 2). O 
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Definition 2.17 Let (92 , 29 , P ) be a probability space and let % C 9, i e /, be 
families of subsets of 29 . If I = {1, is finite, % are said to be independent 
if any A j e 52\, ... . A n e 52 n are independent events. If I is not finite, % are 
independent if i e J, are independent families for each finite J C /. 

This definition and the following criterion can be used to show that two or more 
a -fields are independent. The criterion uses classes of events forming a 7r-system. A 
collection 9 of subsets of Q i s said to be a it -system if it is closed to finite intersection, 
that is, A, B e 9 implies A D B e 9. If 9, is a nonempty class of events in .9 
for each i = 1 such that (1) % is a 7r-system and (2) 9/, i = 1, . . . , n, are 
independent, then the a-fields o{%), i = 1, . . . , n, are independent ([11], Theorem 
4.1.1). 

Definition 2.18 Let (22 , :9 , P) be a probability space, S a metric space, .9 the 
a -field induced on S by its metric, and X, : £2 -* S, i e I, a collection of random 
elements, that is, a collection of (.9 , 59 ) -measurable functions, where / may or may 
not be finite. The random elements X, are independent if the a -fields cr(X, ) = 
(t(X^ (-9), i e /) generated by these elements are independent. 

Example 2.19 Let X and The real-valued random variables defined on (£2, .9 . P ). 
The independence of cr(X) and a(Y) implies P(X~ [ ((— oo, x])(TT _1 ((— oo, y])) = 
P(X < x)P(Y < y ) = Fx{x)FY(y), x,y e M, where Fx and Fy denote the 
distributions of X and Y. The converse also holds, that is, P(X < x.Y < y) = 
Fx(x)Fy(y) implies the independence of cr (X) and o(Y). For example, P({a\ < 
X<a 2 }n{b l < Y < b 2 }) = P(X < a 2 ,Y < b 2 ) - P(X <a u Y < b 2 ) - P(X < 
a 2 ,Y < b | ) + P(X < a\.Y < b \ ) by properties of the probability measure, 
or P({fli < X < a 2 j n {bi < Y < b 2 }) = F x (a 2 )F Y (b 2 ) - F x (a\)F Y (b 2 ) - 
F x (a 2 )F Y {b\) + Fx(a\)Fy(b]), so that P{{a\ < X < a 2 } fl {b\ < Y < b 2 \) = 
( F x (a 2 ) - F x (ai))(F Y (b 2 ) - F Y {b\)) = P(a\ < X < a 2 )P(b\ < Y < b 2 ). The 
latter relationship implies the independence of the a-fields a (X) and a(Y) since the 
intervals (a i, a 2 \ and (b\ , b 2 \ are arbitrary. O 

Example 2.20 Let Xk , k = 1, 2, . . . , be independent, real- valued random variables 
on a probability space (L?, ,9 , P) and (pk : R — > R Borel measurable functions. The 
random variables <pk o X^, k = 1 , 2, . . . , are independent. 

Proof We have 1 (0) C £% and Xf l ((pf l {3§)) C p9 because q>p is a Borel 
measurable function and Xk is a random variable. Since C XV {,91) 

and the a -fields Xf^{PS), k = 1,2,..., are independent by assumption, the random 
variables cpk o Xk are independent. O 

Example 2.21 Let X(t) and Y(t), t e [0, 1], be simple real-valued processes with con- 
tinuous samples (x\ (t ), . . . , x m (t)) and (yi (t), ... , y„(t)) occurring with probabili- 
ties (p i , . . . , p m ) and (q\ . . . . , q n ), respectively. It is assumed that both processes are 
defined on the same probability space {F2 , .9 , P), so that they are measurable func- 
tions from (L?, J9) to (C[0, 1], c 9), and A; = [co e S2 : X(a>) = x,}, i = 1, ... ,m, 
and Bj = {a> e F2 : Y (a>) = yj}, j = 1, . . . ,n, are measurable partitions of L? . The 
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processes X and Y are independent if the er-fields generated by {A;, i = I , . . . , m } 
and {Bj, j = 1, . . . , n) are independent. O 


2.5 Sequence of Events 

Let {A n , n — 1, 2, . . .} be a sequence of events in a probability space (22, & , P ). 
Properties of probability measures that involve increasing/decreasing, convergent, 
and arbitrary sequences of events are discussed. 

Definition 2.19 The sequence { A„ , n = 1, 2, . . .} is said to be increasing if A„ C 
A n+ \ for all n. If A„ 3 A„+i for all n, the sequence is decreasing. The sequence is 
convergent if lim sup,,^ A n = (T~ 1 A k and lim inf,,^ A n = U~ j n£i n A k 
coincide, and we use the notation lim^oo A n — lim sup,,^.^ A„ = lim inf„^oo A„ 
for the limit of {A n }. Note that lim sup,,^^ A„, lim inf,,-^ A„, and lim„^oo A„ 
are events in (22, & , P). 

Theorem 2.6 (Continuity of probability measure) Let \A„. n = 1,2,...) be an 
increasing or decreasing sequence of events. The numerical sequence {P{A n ), n = 
1 , 2, . . .} is increasing or decreasing and converges to P(A), where A — lim„_ j . 00 A n . 

Proof Suppose {A,,} is increasing, so that it converges to A = lim^oo A„ = U^_j 
A„. Set B\=A\ and B n — A n \A n - 1 , n — 2,3,..., so that A n = U^ =1 B k , 
A = U~ j B„, P(A) = P(U? =1 B„) = Zr=i P(B„) = Hm^oc ZLl p ( B k) = 
limn-^oo If) = lim^oo P(A n ) since {B n } are disjoint events. Similar argu- 

ments hold for decreasing sequences. A 

A direct consequence of Theorem 2.6 is that, for a sequence {A„, n = 1,2,...} 
of convergent events, probability and limit can be interchanged, that is, 

lim P(A n ) = P( lim A„) = P(A), (2.13) 

n — >oo n— >oo 

where A = lim sup,^^ A„ = lim inf,,-^ A„ (Exercise 2.9 and [11], Sect. 2.1). 

Let { A„ , n = 1, 2, . . .} be a sequence of events in a probability space (22, , P ), 

and let A = lim sup,,^^ A n = U A k be an event in this space. It can be 
seen that to e A if and only if to e A n for infinitely many indices n. We use the 
notation 

{A„ i.o.} = {A„ infinitely often} = fl^j U ™_ n A k = lim sup A, , (2.14) 

n—>o o 

to indicate this property. 

Theorem 2.7 (Borel-Cantelli lemma) If{A „ , n = 1, 2, . . .} is a sequence of events 
such that p (A n ) < oo, then P(A„ i.o.) = 0. 

If {A n , n = 1,2 ,...} is a sequence of independent events, then P{A n i.o.) = 0 
and P(A n i.o.) = 1 if and only if X,^=i p (A n ) < oo and p (^n) = oo, 

respectively. 
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Proof We have P(A„ i.o.) = lim^oo PQJ^ n A k ) < lim„^oo JfkLn p ( A k) = 0, 
where the first and the last equalities hold since A*} is an increasing sequence 
of events and P(A n ) is convergent by assumption. 

The proof of the second part of the Borel-Cantelli lemma can be found in [11] 
(Proposition 4.5.2). A 

Example 2.22 Let {X n , n = 1,2,.. .} be a sequence of Bernoulli random variables 
taking the values 1 and 0 with probabilities P(X n = l) = p n = l — P(X n = 
0), n > 1. If Pn < °°’ then P{\ X n = 1} i.o.) = 0 by the Borel-Cantelli 

lemma, so that X„ is equal to 1, a finite number of times. Other illustrations of the 
use of the Borel-Cantelli lemma can be found in [1 1] (Sect. 4.5). O 

Example 2.23 Let po = P(V < v cr ) denote the probability that yearly wind speed 
maximum does not exceed a critical value v cr at a site. The event { V < v cr ) occurs 
infinitely often with probability one since Po = oo for po > 0 and the events 

{ V < v cr } in distinct years are independent. The statement follows from the second 
part of the Borel-Cantelli lemma. <> 


2.6 Expectation 

The expectation operator is first defined for simple random variables. The definition 
is then extended to positive and arbitrary random variables, random vectors, and 
random functions. The expectation operator is used to define moments of random 
elements. Fubini’s theorem and applications of this theorem conclude the section. 

Definition 2.20 Let (Q . .'P , P) be a probability space, [A, e .'P , i e 1} a partition 
of L?, / a finite index set, and a; e such that ||a,-|| < oo. Then 

X = ^ ai \ Ai , aj e (2.15) 

iel 

is called a finite, simple R^-valued random variable. The collection of simple random 
variable is a vector space (Exercise 2.1 1). 

Definition 2.21 The expectation of X in (2.15), denoted by E[X], XdP, or 
fa X(co) dP(co), is 


E[X] = / XdP= / X(co)dP(co) = Va,P(A,). ( 2 . 16 ) 

Jq Jq 

Consider a mapping g : — »■ such that ||g(fl/)|| < oo, i e I . Then, g(X) is 

a simple M 0 ' -valued random variable with expectation 

E[g(X)] = Y J 8(a i )P(Ai). 


(2.17) 


2.6 Expectation 


23 


The definition is meaningful since ||g(a,)|| < oo, i e I. 

Following are the properties of the expectation for simple random variables. 
Except for Jensen’s inequality (Exercise 2.12), all the other properties result directly 
from (2.16) and (2.17). The random variables in the first three properties given by 
(2.18) are real-valued. 


If a; > 0 in (2.15), then £[X] > 0, 

If X < Ta.s„ then E[X] < E[Y], 

If g : M — > K is convex, then g(,E[X]) < £[g(X)] (Jensen’s inequality), 
E[aX + /3Y] = ciE[X] + /3E[Y], a, e R, and 


/U;B, 


XdP = ^ / XdP , 


where { B, e ,'P\ is a measurable partition of Q . 

(2.18) 


The second and the fourth properties show that £[■] is a monotone and a linear 
operator. Jensen’s inequality implies IL^X]! < E [ | X | ] since the absolute value is a 
convex function. The inequality also follows from the definition of the expectation, 
which gives |£[X]| = | Y jieI a t P(Ai)\ < X,e/ \ai\P(Ai) = £[|X|]. 

We now extend the definition of expectation in (2.16) to positive real-valued 
random variables and, subsequently, arbitrary random variables. 


Definition 2.22 Let X be a real-valued, positive random variable defined on a proba- 
bility space (J2, & , P ), that is, P(X > 0) = 1. If P(X — oo) > 0, set E[X] = oo. 
Otherwise, 


E[X] = lim E[X n ], (2.19) 

n — > oo 

where X n is an approximating sequence of finite-valued simple random variables 
such that X n f X and E[X n ] is finite for each n. 

The definition is meaningful since ( 1 ) there exists an increasing sequence of simple 
random variables X„ > 0, n = 1,2, , referred to as an approximating sequence 
of X, such that lim„_ j . 00 X n (a>) = X (co) for almost all oj’s ([11], Theorem 5.1.1) and 
(2) the expectation in (2.19) is well-defined since the value of E[X\ does not change 
if {X,,} is replaces with another approximating sequence Y m J X ([11], Proposition 
5.2.1). 

Definition 2.23 Let X be a real-valued random variable defined on a probability 
space (J2, JF, P), and set X + = X V 0 = max(X, 0) and X~ = (—X) V 0 = 
max(— X, 0). The expectation of X is 

[ £[X] = E[X + ] — EIJT], if £[X + ] < oo and/or £[X“] < oo 
[ Does not exist, if £[X + ] = E[X _ ] = oo. 

(2.20) 

If both expectations £[X + ] and E[X~] are hnite, then E\ X\ exists, is finite, and has 
the expression £[X] = L[A' + ] — E[X~\. If only one of the expectations E[X + ] and 
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E[X ] is finite, then E[X] exists but is unbounded. As for simple random variables, 
we have 



SeJf, 


( 2 . 21 ) 


and, if E[X I 5 ] is finite, we say that X is integrable with respect to P over B or just 
P-integrable over B. The definitions in (2.20) and (2.21) extend directly to random 
vectors and complex-valued random variables by applying them to the coordinates 
and real/imaginary parts of these variables, respectively. 

Example 2.24 Let X(u>) = i 2 + j 2 be a random variable defined on (£2 , JF, P), 
where £2 — {w — ( i , j) : i, j = 1,2, , 6} is the sample space for the experiment 
of rolling two dice, & consists of all subsets of £2 , and P ( { w | ) = 1/36. Then 
X is a positive, simple random variable and E[Xlg] = XmeB a 2 + f)( i/36) = 
[(2) (40) + (2) (34) + 32]/36 = 5 for B = {« = (i, j) : i + j = 8}. <> 

The second and the third properties in (2.18) are also valid for the expectation in 
(2.20) (Exercises 2.12-2.14 and [4], Sect. 3.2). Following are the properties specific 
to the expectation in (2.20). 


Theorem 2.8 Let X be a real-valued random variable defined on a probability space 
0 Q,&, P). Then ([4], Sect. 3.2) 



If X is P — integrable , then X is finite a. s., that is, 
N = {a>:X(a>) = ±oo} e £Pand P(N) = 0. 


( 2 . 22 ) 


Definition 2.24 Let X be a real-valued random variable defined on a probability 
space (£2, ■'¥ . P) and let q > 1 be an integer. If p.(q) = E[X q ] exists and is finite, 
it is called the moment of order q of X. The mean of X is /z = /x(l). If E[(X — p) q ] 
exists and is finite, it is called the central moment of order q of X. The central moment 
a 2 = E[(X — /x ) 2 ] is the variance of X. The square root cr of the variance a 2 is the 
standard deviation of X. The scaled versions of the third and fourth central moments 
y 3 = £[(X — /x) 3 ]/cr 3 and /4 = E[(X — /z) 4 ]/a 4 are the skewness and kurtosis 
coefficients of X. The ratio v = a / p defined for p 0 is called coefficient of 
variation. 

The definition of moments of X is meaningful since the mappings X i->- X q , 
(X — p,) q are continuous and, therefore, measurable. Hence, X q , (X — p.) q are 
random variables on (£2 , fp , P). 

Definition 2.25 Let X = (Xj, . . . , X ( i) be an Revalued random variable defined 
on a probability space (£2 , & , P) and let q > 1 be an integer. The moments of order 
q of X are 
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v(qi, ■■■,qd) = 



(2.23) 


provided the expectations in (2.23) exist and are finite, where q ,• > 0 are integers 
such that Xf=i q> = q- The mean or the expectation of X is the vector E[X] — 
(£[Xi] = nil, 0, . . . , 0), . . . , E[Xd ] = p. (0, . . . , 0, 1)). The (d, tf)-matrix r = 
{rjj = E[X,X j], i, j = 1, . . . , d] whose entries are moments of order q = 2 is 
called the correlation matrix of X. The formula in (2.23) with X , — E[Xj] in place 
of Xj, i = 1 , ,d, give the central moments of order q of X. The (d, <f)-matrix 

c — {aj = E[(Xi — E[Xi]){Xj — E[X j})\, i, j = 1 d] is called the covariance 

matrix of X. The scaled version of the covariance matrix p = { (>,j = E[(X, — 
E[Xj])(X j — E[X j]) / (Oja j)\, i. j = 1, . . . , d] is the correlation coefficient matrix, 
where of = ca, i = 1 , . . . , d. 

The definition in (2.23) is meaningful since the mapping X i->- nf=i*f ^ 
continuous so that it is measurable implying that nf=i i s a random variable 
defined on the same probability space as X. Similar considerations hold for the 
mapping X -* \\j =x (X ; - £[X/])«‘'. 

Definition 2.26 Let X be an -valued random variable with finite moments of 
order 2. Two distinct coordinates X j and X j of X are said to be orthogonal if r, ; ' = 
E[XjXj ] = 0. If Cij = ( Xj — E[Xi~\)(X j — E[Xj ] ) = 0, then Xj and Xj are said 
to be uncorrelated. The correlations and covariances of X coincide if E[X, ] = 0 for 
all i = 1 , ,d. 

Example 2.25 Let X — U + i V be a complex-valued random variable, where IS and 
V are real-valued random variables. If E[\U\\ < oo and E[|V|] < oo, then E[X\ 
exists and is equal to LTX] = E[U] + iE[V], that is, the real and imaginary parts 
of X are viewed as the coordinates of a two-dimensional vector. <> 

The coordinate by coordinate definition of the expectation for random vectors 
extends directly to random functions by viewing their values at various arguments 
as coordinates of a random vector with finite, countable, or uncountable number of 
coordinates. 

Definition 2.27 Let X(t ) = (Xi(t), . . . , Xd(t)), t e D C , be an M^-valued 
random function defined on a probability space ( L? , & , P) . The expectation of X (t ) 
exists if and only if the expectation of all its coordinates {X, (t), t e D,i — \ ..... d) 
exist. In this case, we have E[X(t)] — (f)], . . . , Zs[X^(t)]), t e D. 

Example 2.26 Let Y \ and Yj be random variables on a probability space (Q . ,'X . P) 
such that LDLfcl] < oo, k= 1, 2. The Revalued random function X{t) — (X\(t) = 
Y\ cos(t), X 2 (t) = Y 2 sin(f)), t e [0, 2jt], is defined on (f2, & , P) and has expec- 
tation (Zs[Xi(f)] = E[Y\] cos(t), E[X 2 (t)] = E[Y 2 \ sin(f)), t e [0, 2n], O 

We conclude this section with the statement of Fubini’s theorem specifying con- 
ditions under which integrals of measurable functions defined on product probability 
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spaces can be performed sequentially. The theorem is used extensively in calcula- 
tions, as illustrated by two examples. 

Theorem 2.9 (Fubini’s theorem) Let (£2^, & t, Pk), k = 1,2, be two complete 
probability spaces and denote by £2 = £2 \ x £22, & = &\ x and P = 
P\ x P 2 the product sample space, a -field, and probability measure. If {m \ , a> 2 ) i — > 
X{m i, a> 2 ) is measurable and P-integrable, then 


X{m\, •) is — measurable and P 2 — integrable for each m\ e £2 1 , 

X{-, M 2 ) If {dm 2 ) is — measurable andPi — integrable, and 


’ 


X(co)P(dco) = 


/ 


X(m\, m 2 ) P 2 {dmfi) 


' & 2 


Pi {dm\). 


(2.24) 


If in addition X is positive and either side of{ 2.24) exists and is finite or infinite, so 
is the other side of the equality, and the equality is valid ([4], p. 59). 

Example 2.27 Let {£2 , fp , P) and ([0, 1], J4?{[0, 1]), X) be measure spaces, where 
X denotes the Lebesgue measure. Let X : ([0, 1] x £2 , 8&{[0, 1]) x J 5 ") — > (R, 8$) 
be a measurable function defined on the product of these spaces endowed with the 
product measure X x P . It is common to interpret the first argument of X as time. 
The integral I {A, m) = f A 1b(X(5, m))ds, A e 8S([0, 1]), B e 88, represents the 
time X{-, m), m e £2, spends in B during a time interval A. The expectation of this 
occupation time is E[I{A,m)] = f A P{X{s) e B)ds.<> 

Proof The measurable mapping {s,m) i->- X{s,m) is said to be a stochastic 
process, and s i->- X{s, m) is sample m of X. For J(£ = {0, {0, 1}, {0}, {1}}, the 
indicator function, 1#:(R, 88) -* ({0, 1}, Jff), B e 83, is measurable so that 
\b o X:([0, 1] x £2, 8§{[ 0, 1]) x &) — > ({0, 1}, Jff) is also measurable. The expec- 
tation of the occupation time is 


E[I{A,m)] = 


ls(X(s, m)) dsP{dm) = 


1 b{X{s, m))ds 


1b(X(j, m))P{dm) 


Jn L Ja 

ds= P(X{s) e B)ds, 


P{dm) 


by Fubini’s theorem ([11], Example 5.9.1). A 

Example 2.28 Consider a cantilever beam with unit length and stiffness that is fixed 
at its left end and subjected to a distributed random load X(.r),x e [0, 1]. It is assumed 
that the mapping {.x , m) — > X{x,m) is measurable from ([0, I] x 12, 8S[ 0, 1] x ■¥) 
to (R, 83) and X x P-integrable, where X denotes the Lebesgue measure on the real 
line. The beam displacement U{1) at its free end and its expectations are 
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U( 1) = / X(x + u)u 1(0 < u < 1 — y, 0 < v < z)dudydz and 
■Ao,i ] 3 

£[t/(l)] = I E[X(x + m)]m1(0 < u < 1 — y, 0 < y < z)dudvdz. 

7[0,l] 3 

If E[X(x)] — q is space invariant, then E[U (1)] = q/ 8 . O 

Proof The beam displacement satisfies the equation U"( x) = — M (x ) with bound- 
ary conditions C/ (0) = 0 and U'( 0) = 0, where M(x ) = — j ( ] x X(x + u)udu 
denotes the bending moment in the beam. The solution of this equation is U (x ) = 
fo dz /q dy f Q l ' X(x + u) udn, so that 

£[{7(1)]= / X(x + u, ®)m1(0 < u < 1 — y, 0 < y < z) dii dy dzP(da>), 

JQx[ o, l ] 3 

where du dy dz is the Lebesgue measure on [0, l] 3 . Since (u, y, z, co) t-^-X (m, y, z,a>) 
is measurable, we have 


E[U( 1)] 




X(x+u, y, z, a>)P(du>) 


u 1(0 < u < 1— y, 0 < y < z) du dy dz, 


by Fubini’s theorem, which gives the stated formula of £[{/(l)]. ▲ 

Example 2.29 Let X(t, co) and Y (t , o>) be real- valued random variables defined on a 
probability space ([0, r] x Q , &[0, r] x & , X x P), 0 < r < oo. SupposeXis the 
solution of the differential equation X(t, co) + 2t;voX(t, co) + Vq X(t, co) = Y(t, co) 
with X(0,co) = 0, X(t,co) = 0, where t; e (0, 1) and vo > 0 are constants, 
and the dots denote time derivatives. We will see that the solution of this equation 
is a stochastic process, that is, a family of random variables indexed by t e [0, r], 
defined by X(t, co) — Jg h(t — s)Y(s, co) ds, where h(u) = exp(— ^vqu) &in{vdu) /vd 
and Vd = vq v / 1 — 7 2 - The expectation of this process at an arbitrary time t is the 
double integral E[X{t , &>)] = f Q [ fg h(t — s)Y(s, co) ds]P(dco). If h(t — s)Y (s , co) 
is lM[0, r ] x ^"-measurable and X x P-integrable, Fubini’s theorem applies and gives 


E[X(t,co)] 


h(t — s)E[Y(s, co))ds 


(2.25) 


so that E[X(t, w)] = pc y (1 h(t — s) ds if LfTls, co)] = pc y is time invariant. O 


2.7 Convergence of Sequences of Random Variables 

Let X and X n . n = 1, 2, . . . , be real-valued random variables defined on a proba- 
bility space (P2 , & , P ). There are various definitions for the convergence X„ —> X 
depending on the manner in which the discrepancy between X„ and X is measured. 
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Fig. 2.1 Relations between 
the convergence of 
sequences of random 
variables 


X * X x -X 



Definition 2.28 The sequence {X,,} converges to X almost surely (X n X), in 
probability ( X n X), in distribution (X„ — X), and in L p (X„ — ^ X), 
where p > 1 is an integer, if 

lim X„(tt>) = X(a>), V&> e \ N with P(N ) = 0, 

n — >oo 

lim P(|X„ — X| > e) = 0, Ve > 0, 

71 — > OO 

lim F n (x) = F(x) at continuity points x e R, and 

71— >-0 O 

lim -E[|X„ — X\ p ] = 0, (2.26) 

72— >00 

m.p. 

respectively. The convergence X„ — > X for p = 2 is called mean square (m.s.) 
convergence and is denoted by X n X or l.i.m. n ^ooX„ = X. 

Figure 2.1 adapted from [7] (Sect. 2.13) gives essential relations between various 
types of convergence. An extensive discussion on relationships between convergence 
types can be found, for example, in [4] (Chap. 4) and [11] (Sects. 6.3 and 8.5). We only 
discuss some of the arrows in Fig. 2.1. That m.s. convergence implies convergence 
in probability follows from the Chebyshev inequality. 

The convergence X„ X means P(|X„ — X| > s i.o.) = 0fore > Oarbitrary, 
since it requires that X n (o>) — X (a>)\ gets small and remains small for almost all 
co e Q . We have 0 = P(limsup )! ^ 00 |X„ — X| > e) = lim„_ j . 00 P(U m >„|X,„ — 

X| > e) > limn^oo P(|X„ — X| > e), that is, X n X. 

To show that the convergence in distribution is implied by that in probability, note 
that for s > 0 arbitrary, we have 

P(X n <x) = P(X n < x, |X„ - X| < e) + P(X n < x, \X n - X\ > e) 

< P(X n < x, \X n - X\ < e) + P( |X„ - X| > £) 
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and P(X n < x, \X n — X\ < e) < P(X n < x, X < X n + e) < P(X < x + e), which 
gives P(X n < x) < P{X < x + s) + P(\X n — 2f| > e). This inequality and a sim- 
ilar inequality obtained by interchanging X with X n yield PiX < x — s ) — 
P(\X n -X\ > e) < P(X n < x ) < P(X < x + e) + P(\X n - X\ > e), so that 
limn^oo P(X n < x) = P(X < x) provided the distribution of X is continuous at x. 

Example 2.30 Let { X n } be a sequence of random variables converging in probability 
to X. Then [X„] is a Cauchy sequence in probability, that is, for arbitrary e > 0 and 
11 > 0 there exists n{e, p) such that P(\X m — X n \ > e) < rj for m, n > n(e, rj). O 

Proof Since P( \X m - X\ + \X n - X\ > e) < P(\X m - X\ > e/2) + P(\X n - 
X\ > e/2) and P (\X m — X\ > e/2 , P(\X n — X| > e/2) -> 0 as m, n — > oo, the 
sequence {X n } is Cauchy in probability. ▲ 

Example 2.31 If {X^} are uncorrelated random variables with finite mean p and 
variance a 2 , then (X n — pf/n 0 as n -> oo. The convergence is referred 
to as the weak law of large numbers ([9], p. 36). O 

Example 2.32 If { Xk } are independent identically distributed (iid) random variables 
with finite expectation, then ^k/ n E[X\]asn —*■ oo. The convergence is 
referred to as the strong law of large numbers [11] (Sects. 7.4 and 7.5). It shows that 
averages along almost all infinite samples of the sequence (X\ , X 2 , ■ . .) are equal to 

£[*t].0 

Example 2.33 If {X^} are iid random variables with finite mean /i = E[X 1 ] 
and variance ct 2 = E[(X\ — pi) 1 ], then (1/^/n) — P)/ a N( 0, 1) 

as n —> 00 . The convergence is referred to as the central limit theorem ([11], 
Sects. 8.2). O 

Theorem 2.10 (Dominated, bounded, and monotone convergence theorems) Let 
X n , n = 1, 2, . . . , be real-valued random variables defined a probability space 
{L2 , ^ , P) such that lintn-^oo X„ = X a.s., A e & , and Y a random variable 
defined on the same probability space. If 



(2) \X n | < c a.s. for a positive constant c, or 

(3) X n > 0 a.s. is an increasing sequence that can take on the value + 00 , then 



(2.27) 


an equality referred to as the dominated convergence, the bounded convergence, 
or the monotone convergence under condition (1), condition (2), or condition (3), 
respectively ([4], Sect. 3.2). 

The interchange of limit and integral operators in (2.27) resembles a property of 
Riemann integrals. For example, if the sequence of real-valued functions {h n (x)} 
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converges uniformly to h{x) on [a, b], we have f’ h(x) dx = J* limn-^oo h„(x) dx = 
limn-j.oo fjf h n (x) dx, where J^(-)dx denotes a Riemann integral ([2], Theorem 
30.3). 

Theorem 2.11 (Integration term by term, Fatou’s lemma, Lebesgue’s theorem) If 
{X,,} and Y are random variables on a probability space (1 2 , & , P ) and A e & , 
the following three statements hold. 


| X n | dP < oo, then / X„ dP — / X n dP\ 


IfX n >0 a.s. on A, then / dim inf X n ) dP < lim inf / X n dP ; and 

Ja n n Ja 

If \X n \ < Y where Y > 0 a. s. and P — integrable over A, then 

/ (lim inf X n ) dP < lim inf / X n dP < lim sup / X n dP< / (lim sup X n ) dP. 

J A " " Ja n J A J A n 

(2.28) 


Proof The proof of the first two statements in (2.28), that is, integration term by 
term and Fatou’s lemma, can be found in [4] (Sect. 3.2). We only prove the last 
statement, that is, Lebesgue’s theorem. Recall that / \k> n Xk and V k> n Xt are increas- 
ing and decreasing sequences and that liminfn^oo X n = sup H>1 infyt>„ = 
Hindoo Ak> n X k and lim sup„_ > . 0O X n = inf„>] sup^ X k = lim, ,^00 V*>„X*. 
Note that Vk>nXk = -Afc>„(-X* : )sothatlimsup n ^ 0O X n = - liminf„^ 00 (-Z„). 
We also have {v n X n < x] = r\„{X n < x] e & and {A n X n > x} — n„{Z„ > x} 
e 

The Fatou lemma applied to sequence {X n + Y}, X„ + Y > 0 a.s., gives 


/ (lim inf X n )dP+ Y dP < lim inf / X„dP+ Y dP 
Ja Ja Ja Ja 

so that J A (lim inf X n )dP < lim inf f A X n dP. The last inequality in (2.28) results 
from sup{X,!. X n +\, . . .} = — inf { — X n , —X ll+ \ ,...}. The middle inequality in 
Lebesgue’s theorem is valid since f A X n dP is a numerical sequence. A 

Example 2.34 If {X,,} is a sequence of real- valued random variables converging 
a.s. to X and there exists a random variable Z>0 a.s. such that \X n \ < Z, then 
lim„^oo E[X n ] = £'[lim„_ >00 X n \ = E[X], This convergence results by Lebesgue’s 
theorem with A = P2 . O 

We conclude this section by noting that the family of random variables {X} defined 
on a probability space ( £2 , & , P ) with the property E[\X\ P ] < 00 constitute a vector 
space that, with the norm ||X|| = (Zs[|X| p ]) 1 ^ /: ’ becomes a Hilbert space denoted by 
L P (Q , , P). where p > I is an integer. Properties of L p (Q , & , P ) relevant to 

our discussion are in Sect. B. 5. 


2.8 Radon-Nikodym Derivative 
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2.8 Radon-Nikodym Derivative 


A direct application of Radon-Nikodym derivatives is the construction of improved 
Monte Carlo simulation algorithms for estimating properties of random elements. 

Definition 2.29 Let (£2, ■'¥) be a measurable space and let /x, v:£2 — > [0, oo] be 
measures on this space. If /x(A) = 0, A e , implies v(A) = 0, we say that v is 
absolutely continuous with respect to /x and indicate this property by the notation 
v <•£ /x. If v /x and /i«v, then v and /x are said to be equivalent measures. 

Example 2.35 Consider a measure space (12, JF, /x) and a measurable function 
h : (£2, &) — * ([0, oo), 38([0, oo))). Then 

v(A)= [ h dpi, Ae3? (2.29) 

J A 

is a measure that is absolutely continuous with respect to /x. O 

Proof The set function v is positive by definition. It is countably additive since for 
A„ e 5? , n = 1,2,..., disjoint sets, we have 

v(U~ 1 A„)= f hdp = I y>U,4x 
2u“ =1 a„ J n=l 

OO n OO n OO 

= y' I hl A „ dji = y / h d/.i = y%(A„). 

n = 1 n=l /z= 1 

The term by term integration is valid whether j / /; 1 -y, dp is or is not finite since 
h is positive. The measure v is absolutely continuous with respect to /x since 


v(A) = 



hlAdp < sup(/z(<u))/x(A) 

a)S A 


so that /x(A) = 0 implies v(A) = 0 with the convention 0 • oo = 0. ▲ 

A converse of this example is provided by following theorem, referred to as the 
Radon-Nikodym theorem, that guarantees the existence of a measurable function h 
satisfying (2.29) for given measures /x and v under some conditions. 

Theorem 2.12 If /x and v are a-finite measures on a measurable space (£2, JP) 
such that v <<C /x, then there exists a measurable function 

dv 

h — — : (£2, &) -* ([0, oo), 38 ([ 0, oo))), (2.30) 

dp 

called the Radon-Nikodym derivative ofv with respect to /x, such that (2.29) holds 
([5], Theorem 18, p. 116). 
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Example 2.36 Let X be a real-valued random variable defined on a probability space 
{£2, .9? , P). The measure Q induced on (R, M) by P, that is, the distribution of X, is 
Q(B) = P(X~ l (B)), B e £98. Set F(x) = Q((—o o, x]) for B = (— oo, x], x e R. 

Suppose Q is absolutely continuous with respect to the Lebesgue measure 9 on the 
real line, that is, a ( B ) = 0 implies Q(B) = 0 for all B e 33. Since Q is a probability 
measure, it is finite and, hence, a -finite. Theorem 2.12 shows that there exists a 
Randon-Nikodym derivative h = dQ/dk : (R, £98) — > ([0, oo), £98 ([0, oo))) such 
that Q(B) = f B hdk for all B e £98. For B — (— oo, x] we have F(x) = Q(B) 
j f (u) dX(u) with the notation h = f. The function f(x) = dF(x)/dx is called 
the probability density function or the density of X. O 


2.9 Distribution and Density Functions 

Let X be an R r/ -valued random variable defined on a probability space (£2 , .'P . P). 
For u, v e we use the notation u < v(u < v) to mean w; < < v,) for 

all i" = l,..., d. Similarly, (— oo, u] denotes the rectangle x^ =1 (— oo, u{]. We have 
seen that the distribution Q of X is the probability measure induced on ( , 3S d ) , 
that is, Q(B ) = P(X~ l (B)), B e S$ d (Definition 2.12). It is common to view 
distributions of random variables as probability measures Q defined for intervals 
B = x^ =1 (— oo, x/]. This definition of Q is not restrictive ([1 1], Corollary 3.2.1 and 
Proposition 3.2.4). 

Definition 2.30 The distribution function of X is 

F(x) = P [X~ l ((—oo, x])) — P o X _1 ((— oo, x]) = Q((— oo, x]), x e 

(2.31) 

where Q = P o X~ l is the probability measure induced by P on (R rf , B3(R d )). 

Definition 2.31 Let F be the distribution of an Revalued random variable that is 
absolutely continuous with respect to the Lebesgue measure I on R rf . The Radon- 
Nikodym derivative of F with respect to /.. referred to the density of X, exists and is 
given by 


fix) = 


d d F(x) 
3xi ■ • • dxd ’ 


x eR d . 


(2.32) 


Since P {X; e (x,-,Xi +cfr,]}) ~ /(x) dx, the volume of/over an infinitesimal 
rectangle x d = l (x/ , x,- + dx, ] gives the probability that X takes values in this rectangle. 

Definition 2.32 The distribution of one or more coordinates of X can be obtained 
from the distribution or the density of X. For example, the distribution and the density 
of X\ are Fi(xi) = F(xi, oo, . . . , oo) and /i(xi) = f(x) dxi ■ ■ ■ dx^ or 

/i(xi) = dF\ (x\ )/dx\ , respectively. 


2.9 Distribution and Density Functions 
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Let X be a real-valued random variable. Following is a list of properties of the 
distribution function F of X that are useful for calculations. F is a right continuous, 
increasing function with range [0,1]; F can have only jump discontinuities and the 
set of these jumps is countable; F is continuous at x e R if and only if P(X — x) = 
0; lim^oo F(x) = 1; lim.v-^-oo Fix) — 0; P(a < X < b) = F{b ) — F(a ) > 0 
fora < b\ and P(a < X < b) = F(b) — F(a ) + P(X — a) — P(X = b) fora < b. 
These facts follow the properties of probability measures and real-valued functions 
(Exercise 2.17, [7], Sect. 2.10.1, [11], Sect. 2.1). 

If the distribution F of a real-valued random variable X is absolutely con- 
tinuous with respect to the Lebesgue measure on the real line, it has a density 
fix) = dF(x)/dx with the following properties: F(b) — F(a ) = f(x)dx for 
a < b, f{x) = F'{x) so that f_ f (%){$; = F(x), / > 0 since F is an increasing 
function, and f(x)dx — 1. Note that/is not a probability measure. 

Example 2.37 Let X be a real- valued random variable defined on a probability space 
(f2, & , P) and let g : (R, SS) —*■ (R, S§) be a measurable function. If Y = g o X, a 
random variable on (Q . TP, P), has finite expectation, then 


E[Y] = / Y(co)Pidco) = / g{X{eo))P(dco) 


= / g(x)Q(dx)= / g(x)dF(x)= / g(x)f(x)dx, 


(2.33) 


where Q(B ) = P(X 1 ( B)), B e 34, is the distribution of X. O 

Proof If g = l fi> Be SB, then E[Y] = P{X e B) = Q(B). If g = £ i€l bi l Bi , 
the subsets B, e SS partition R, I is a finite index set, and bj are real constants, 
then E[Y] = £ ieI btP(X e B t ) = £ ieI b iQ(Bi) since integration is a linear 
operator. If g is an arbitrary positive Borel function, there exists a sequence of simple, 
increasing, and measurable functions g n , n = 1,2, , converging to g as n — » oo. 

The expectations of g„(X) calculated by all formulas in (2.33) coincide, so that 
(2.33) holds also for g(X) by the monotone convergence theorem. If g is an arbitrary 
Borel function, (2.33) holds since g — g + — g~ and f g(X) dP = f g + (X) dP — 
f g~(X) dP . A 

Example 2.38 A real-valued random variable X with density 


/(*) = 




-exp 




x, p. e 


a > 0, 
(2.34) 


is said to be Gaussian with mean p, variance a 2 , and standard deviation a, a property 
denoted by X ~ N(p, a 2 ). The function 4>(u) = exp(— u 2 /2)/\/2 tt , u e R, is the 
density of the standard Gaussian variable N( 0, 1). O 


Example2.39 Consider a Cauchy random variable A with density fix) = a/[7r(a 2 + 
x 2 )], x e R, where a > 0 is a constant. The expectation of X does not exist since 
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£[X + ] = E [lx>o^] = Jo°xf(x)dx — a log(a 2 + x 2 )/(2n) |“= +oo and 
E[X~] = +oo. O 

Example 2.40 The distribution of X = F~ l o 0(G ) is F, where E is a continuous 
distribution, 0 denotes the distribution of MO.l j, that is, 0 (x ) = ff <p(u) du, and 
G ~ N(0, 1). O 

Proof Since X is a continuous function of G, it is a random variable. We have 
P(X <x) = P(F~ l o0(G) <x) = P(G < 0~ l (F(x))) = F(x), x e K. Similar 
arguments show that 0{G) ~ G(0, 1) is a random variable uniformly distributed in 
(0,1), that is, P(0(G) < x) = x, x e [0, 1], The representation X — F~ l o 0(G) 
is used to generate samples of random variables following arbitrary distributions. ▲ 

Suppose now that X is an -valued random variable and d > 1. As for the case 
d — 1 , we list the properties of the distribution function F of X: lim lJ .^, X) F (x), 1 < 
k < d is the joint distribution of (A i , . . . , Xk-i, A/+i , . . . , X,j), lim^^-oo Fix) = 
0 for k e {1, . . . , d), function Xk i->- Fix) is increasing for each k e {1, . . . , d}, and 
function Xk x-? F(x) is right continuous for each k e {1, . . . , d). 

Definition 2.33 Let X be an Revalued random variable with density /. Denote by 
X (1) and X®, the first d\ < d and the last d 2 — d — d\ coordinates of X for d > 2. 
The conditional density / (1 ' 2) of X (1) given X {2) = z is 

/ (I|2) (X (1) | z) = (2-35) 

where /® denotes the density of X®, k = 1,2, and x^ = (x\, ...,x dl ). If 
fWHxW I z) = /®(x®), then and X <2> are independent. 

The conditional probability P(A | B) in (2.5) provides a heuristic interpretation 
for the conditional density in (2.35). Since P(A \ B) — f^ 2 fx^ \ z) dx ^ for 
A — {X i e (x i , x i T dx i ] , ... ,X dl e (x d] , x dl +<*</,]} andB = {X dl+ \ e (zt, zi + 
dz\], ■ ■ ■ , X d e ( Zd 2 , Zd 2 + dZthW . / <1|2) (t (1) | z)dx m represents the probability 
that A (1) is in the infinitesimal rectangle (xi, xi + dx\] x • • • x (x dl , x dl + dx dl ] 
under the condition X (2> = z. A rigorous discussion on this topic can be found in 
[5] (Sect. 21.3, pp. 416-417). 

Definition 2.34 An R^-valued random variable X is said to be Gaussian with mean 
vector ji and covariance matrix y if it has the density 


fix) = [(2 tt ) <i det(y)] _1 / 2 exp 


1 

2 


(x - n)'y 1 (x-p.) 


x e R d , 


(2.36) 


where (•/ denotes matrix transposition. We use the notation A ~ N(/i. y) to indicate 
that X has this property. If d = 2, = /r, 2 = 0, yij = 5 / 2,2 = L and yi j2 = 

y 2 ,t = p, P e (— 1 , 1 ), the density /is denoted by <p(-, •; p) and has the expression 


</>(xi , x 2 ; p) = 


1 




=exp 


x 2 — 2pxix 2 + xj 


2(1 - p 2 ) 


(xt,x 2 ) e 


(2.37) 
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and is referred to as the density of the standard bivariante Gaussian vector. The 
parameter p is the correlation coefficient between the coordinates of X. 

Example 2.41 Let X be a bivariate random vector with the density in (2.37). The 
conditional density of X\ \ {Xi = z) is 



2(1 - p 2 ) 2 _ 

(2.38) 


This density becomes (x\ \ z) = </>(*i) for p = 0 showing that X\ and Xi are 
independent for this value of p . O 

Example 2.42 LetXbeanM^-valuedrandom variable with density f x .LetY = g(X) 
where g : IR^ — > is a measurable function defining a one-to-one mapping between 
X and Y. The density f y of Y is given by 


fy(y) = fx(Hy))\j\, x,ye R d , 


(2.39) 


where J = {3 x, / dyj , i, j = 1 , . ... d] denotes the Jacobian matrix. 

Proof Since* (->■ y = g(x) is a one-to-one mapping, Jis nonzero everywhere and so 
is the Jacobian of the inverse mapping y i->- * = g -1 (y). Let D x be a neighborhood 
of * e and D y = {ij e : p = g(f),$ e D x ] denote the image of D x by the 
transformation x i->- y = g (x ) . The equality P(X <= D x ) = P(Y e D y ) can be 
written as J D * f x (f)clf = J D ^ f y (rj) dp, or 



by a change of variables. The last equality gives (2.39). 

If the mapping y \-> x = g _1 (y) has multiple solutions, we can construct a 
partition {A,,} of the x-space such that the mapping y * is one-to-one in each 
A v . The probability mass of D y is equal to the sum f v \J v \ of the corresponding 
contributions of the subsets A v , where each term /,, | J v is equal to the right side of 
(2.39) for the restriction of y i->- * = g _1 (y) to A v . ▲ 

Example 2.43 Let X ~ /V(0, 1) and Y = cos(Z). The distribution of Y is P(Y < y) 
= (^(2 kn + cos -1 (v)) — @(2kjr — cos“*(y))) for |y| < 1 since {Y < y} if 

X belongs to [2 kjr — cos _1 (y), 2 kit + cos _1 (y)]. O 

Example 2.44 Let X ~ N( 0, p) be an IR^-valued standard Gaussian variable with 
pa = 1, and set L, = Ff l o <t> ( X, ) , where /) are continuous distributions with 
densities ft, i =[,..., d. The density of Y = (Li , . . . , Yy) e R d is 



fv(y l, ■■■,yd) = [(2 n) d det(p)] 1/2 exp( 


(2.40) 
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where x; = <P 1 o /•) ( v/ ) , i = 1, . . . , d. The non-Gaussian vector Y is said to be a 
translation random vector ([6], Sect. 3.1.1). O 

Proof The equality / J (fT / =1 Yj < yf) = P(C\‘j =] X, < x,) holds since the mappings 
y/ i->- Xi — 0~ l o Fj(xj) are invertible. This gives (2.40) by differentiation. Note 
that (2.40) follows directly from (2.39) since |/| = nf=t fi (Ti )/4 , ( x i)- A 


2.10 Characteristic Function 

The characteristic function defines completely the probability law of random vari- 
ables. We describe random variables by their distributions or characteristic functions 
depending on the objective of the analysis. 

Definition 2.35 Let X be an R -valued random variable with distribution F. The 
characteristic function of X is 

<p(u) = E\e iu ' x ]= [ e iux dF(x) = f e iu ' x f(x)dx, ueR d , (2.41) 
Jr* Jr* 

where/is the density of X, provided it exists and u' denotes the transpose of u e R rf . 
The expectation of the complex-valued random variable exp (iu'X) is obtained 
from the expectations of its real and imaginary parts, that is, £[exp(iVX)] = 
£'[cos(m , X)] + //-’[ sin(u'X)]. The characteristic function is always defined. 

Suppose first that X is a real-valued random variable. Following is a list of prop- 
erties of the characteristic function of X that are useful for calculations: \<p(u)\ < 
<f{ 0) = 1 for all u e R; (p(—u) = (p(u)*, where z* denotes the complex conjugate 
of z € C, (fi is positive definite (Exercise 2.23); the characteristic and the density 
functions are Fourier pairs, that is, 

(p(u) = f e ,ux f(x)dx and f{x )— — ( e~ ,ux q>(u) du\ 

JR 2jT Jr 

(p is uniformly continuous in R; and, if X e L q , then q> e C 9 (R) and ^^-*(0) = 
i k E[X k ] for k = 1, . . . , q. Most of these properties result from the definition of cp 
but the proof of some properties requires technical arguments ([7], Sect. 2. 10.3, [11], 
Sects. 9.2-9. 5). 

Example 2.45 Consider a Cauchy random variable X with density f(x) = a/[n(a 2 + 
x 2 )], a > 0, x e R. (Example 2.39). The characteristic function of X exists and is 
(p(u ) = exp(— a\u\). u e R. However, Xhas no mean since <p(u) is not differentiable 
at u — 0. O 

Example 2.46 Let F(x) = wife 5 x) be the distribution of a real-valued 
random variable X, where I is a finite index set, pt > 0 such that Xie/ Pk = 1. and 
[x* } is an increasing sequence of real numbers. The characteristic function of X is 
(p(u) = X/te/ Pk[ c °s(u.Xk) + i sin(wxfc)] so that it does not vanish as \u\ oo. O 
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Example 2.47 Let X = Xti ^ be a compound Poisson random variable, where 
IV is a Poisson variable with intensity X > 0 and Y\ , Yn. ■ ■ ■ are iid random variables 
that are independent of N. The characteristic function of X is 

(p(u) = exp^ — A. J (1 — e' uy ) dFy(y)j = exp[— A.(l — ^y(w))], u e K, (2.42) 

where Fy and (py denote the distribution and the characteristic functions of Y\ . <> 

Proof We have <p(u) = E[e iuX l(N > l) + e'" x l(IV = 0)] = Yk \(N > 

1)] + P(N = 0) and E[e iu ^=i YkHN - l) ] = (<py(u)) k P(N = k). This 

gives <p(u ) = e~ x '^° =0 (Xq>y {u)) k / k\ = exp[— 1(1 — (py(u))] since P(N — n) = 
e A k n / n\, n = 0, 1 ..... is the probability of the Poisson variable N. k 

Suppose now that X is an -valued random variable and d > 1 . Following is a 
list of relevant properties of the characteristic function (p of X\ \(p(u)\ < (p{Q) = 1 
for all ii e M </ ; the characteristic function and the density of X are Fourier pairs, that 
is, 

<P(u)= f e lu x f (x) dx and f(x) = — j [ e~ lu ’ x cp(u) du; 

Jr* (2 7T) a jRrf 

( p is uniformly continuous; and <p(a) = n/=i Vkiuk ) if X has independent coordi- 
nates, where <Pk(uk ) = E[exp(iukXk)]. 

Example 2.48 Let X ~ N(p., y) be an K^-valued Gaussian variable with density 
given by (2.36). The characteristic function of X is 

<p(u) = expfiu'p. — -u'yu\, ueM. d . (2.43) 

If y is a diagonal matrix, that is, the coordinates of X are uncorrelated, they are 
also independent since cp(u) = nil (Pkiuk), where <pk(uk) = E[exp(iukX k )] = 
exp(iukpik ~ y k kUk/2) and^G ~ N(p. k , y kk ). Generally, lack of correlation does not 
imply independence. However, independence and lack of correlation are equivalent 
for Gaussian variables. O 

Example 2.49 Let X ~ N(p . , y) be an K rf -valued Gaussian variable and set Y = 
aX + b , where a and b are id’ , d) and id’ . 1) matrices with constant entries. Then Y 
is an W -valued Gaussian variable with mean ap, + b and covariance matrix ay a 1 , 
that is, linear transformations of Gaussian vectors are Gaussian vectors. O 

Proof The mean vector and the covariance matrix of Y can be obtained by direct 
calculations using the definition of Y and the linearity of the expectation operator. 
The characteristic function of Y is 
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so that Y is an -valued Gaussian variable with the stated properties. ▲ 

Example 2.50 Let X ~ N(p, y) be an -valued random variable and denote the 
first d\ < d and the last d 2 = d — d \ coordinates of X by X { 11 and X i2] . respectively. 
The conditional vector X = | (X 12 -* = z) is Gaussian with mean vector /t and 

covariance matrix y given by 

^ = ^ + and 

y = ]/(*’!) — ]/(l’2) ( y(2’2) ) — 1 (2.44) 

where p (r) = E[X (r) ] and y {r ' s) = £[(X (r) - ii (r) )(X (s) - p (s) )'], r, s = 1,2.0 

Example 2.51 Let X ~ N (p. y) be a bivariate Gaussian vector. The random vari- 
able, 

X = H 1 + ^(X 2 -I12), (2.45) 

is the optimal, mean square, linear estimator for X | given Xj, where erf = Yi,i, 
a\ ~ Yl,2, and pa\a 2 = yi, 2 - <> 

Proof Let Z = aX 2 + b, a, b e M, be a linear estimator for X i . and impose the 
conditions that Z is unbiased and minimizes the mean square error E [ (Z — Zj ) 2 ] . The 
first condition implies E[Z ] = p. \ so that ap, 2 + b — p. i and Z = a(X 2 — p 2 ) + p.\. 
The mean square error 

E[{Z - Xi) 2 ] = E[(a(X 2 - pl 2 ) - (Xi - pn)) 2 ] = a 2 a\ + a{ - 2apo\o 2 
of the estimator Z takes its minimum value at a — pa\ja 2 . A 


2.11 Conditional Expectation 

Consider a real- valued random variable X defined on a probability space {E2 , & , P) 
such that £[|X|] < oo, that is, the mean of X exists and is finite. Our objective is 
to define the conditional expectation E[X \&]ofX with respect to a sub-er-field Sf 
of JC The expectation E[X \ ( P/\ can be viewed as a local average of X since is 
coarser than & . 

Example 2.52 Let X be an R 2 -valued Gaussian variable with mean zero and covari- 
ances E[Xj] = 1, i = 1,2, and E[X\X 2 ] = p, \p\ < 1. The joint density of X 
and the density of Zi | (X 2 = z ) are given by (2.37) and (2.38). These densities 
show that X\ \ (X 2 = z) ~ N(pz, 1 — p 2 ) so that E[X \ \ X 2 = z] = pz, a result 
that can also be obtained from (2.44). O 

Example 2.53 The sample space, the a -field .:Z, and the probability measure for 
the experiment of rolling two dice are Q = \o> = (/, j) : i. j = 1, . . . , 6}, the 
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collection of all parts of Q , and P(\o)]) = 1/36, respectively. Let X be a random 
variable defined on this space by X (a>) = i+j. The expectation of A - is E[X] = 7. Let 
A„ = \o> = (i, j) : i Aj = n], n = 1 ..... 6 , be measurable sets partitioning Q , for 
example, A 4 = {(4, 4), (4, 5), (5, 4), (4, 6 ), ( 6 , 4)}. The probability that an outcome 
(i, j ) is in A n is equal to the cardinality of A n divided by 36. Let Sf = a (A 1 , . . . , A(,) 
denote the < 7 -field generated by {A„, n = 1, . . . , 6 }, so that the members of ( S are 
unions of members of {A„, n — 1, . . . , 6 }. 

The local averages E[X \ A n \ of A" over A n can be calculated simply. For example, 
E[X | A 4 ] = ((4 + 4) + (4 + 5) + (5 + 4) + (4 + 6) + (6 + 4))(l/5) = 46/5 by 
the definition of X and the fact that the members of A n are equally likely. Similar 
calculations give the other local averages, for example, E[X \ A(,] = 12. In general, 
we have 

e[*|a„]= V x(®)— -^ = — — V x(o>)4 = — / xdp - 

card(A„) card(A„)/36 36 P(A n ) J A 

coeA n coeA n n 

Hence, the conditional expectation of X with respect to Sf is a simple random variable 
denoted by E[ X \ that takes the values {E[X \ A n ]\ with probabilities { P(A„)}. 

that is, the random variable 


6 

E[X\&] = Y,E[ X \ An]U n . 
n= 1 

This random variable is ^-measurable and has the properties f A E[X | Sf] dP = 
f A X dP for all A e ^ and E{E[X | ^]} = E[X]. O 

Proof It is obvious that E[X \ C#] is arandom variable on (P2 , , P). The members 

of are union of A n , so that if, for example, A — A^LiAi, k ^ l, we have f A E[X \ 
&]dP = ZLt E[X I A n ]f AkUAi \ An dP = E[X I A k ]P(A k ) + E[X \ A,]P(A,). 
Direct calculations give f A X dP = f A E[X | (#] dP for A e arbitrary. We also 

have E{E[X \ &]} = XL. E[X \ A n ]P(A n ) = SLt Ia X dP = fa X dP = 
E[X].<> 

Example 2.54 Let X and Tbe real-valued random variables defined on a probability 
space (f2, ■'A , P). Suppose Y is discrete taking distinct values y, , i — 1,2, ... , 
so that the sets Bj = {Y = y,} partition Q . If Pdf ) > 0, the expectation of X 
conditional on Lis a discrete random variable denoted by E[X \ Y ] taking the values 
E[X | Y](co) = E[X | Bj] for co e B i: where E[X | B,] = E[X \ Y = y ; ] = 
f B . x dF(x)/ P(Bj) and F denotes the distribution of X. <> 

Proof The distribution of A - conditional on Bj is Fix \ Bj) = P( X < x , Bj)/P{Bj ) 
so that the conditional expectation of X given B, can be calculated from E [ X \ Bj] = 
JxdF(x | Bj) = J Bj x dF(x)/ P(Bj). 

The conditional expectation E[X \ Y] is equal to the local average of X over 
Bj. It constitutes a coarser version of X that can be viewed as an approximation 
of this random variable. Since the measurable partition {Bj} of Q generates the 
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( 7 -field cr(Y), we may write E[X \ rr(Y)] for E[X \ Y\. The conditional expectation 
E[X | cr(T)] has the same properties as E[X \ 'ZY\ in the previous example. A 

It is not possible to extend the definition of E[X \ Y ] in Example 2.54 to con- 
tinuous random variables Y since, for example, [a> e T2 : a < Y (a>) < /;), a < b, 
does not belong to the tr-field generated by the sets {w e £2 : Y ( co ) = y], y e R. In 
agreement with an observation in this example, we set E [X \ Y] to be the conditional 
expectation E[X \ <j(Y)]. 

Definition 2.36 Let X be a real-valued integrable random variable defined on a 
probability space (T2, ZF , P), and let be a sub-cr -field of . The conditional 
expectation E[X \ If] of X with respect to Sf is the class of ^"-measurable functions 
satisfying the defining relation 



(2.46) 


Note that E[X | ZY] exists ([4], Theorem 9. 1.1) and is such that E[X \ ZY] = £"[X]for 
Z = {0, T2], E[X | Sf] = X forf# = ^(Exercise 2.32), and E{E[X \ Sf]} = £[X] 
by (2.46) with A — Q . 

Theorem 2.13 Let X be a real-valued integrable random variable defined on a prob- 
ability space (Y2 , YYY , P), ZY a sub-a-field of ffi , and Z a real-valued ZY -measurable 
function. Then 


E[(X - E[X | Sf])Z] = 0, VZe^ 


E[XZ | <S] = ZE[X | Sf] a.s., VZ e Sf. 


(2.47) 


Proof The first equality in (2.47) holds for Z = 1^, A e ZY , by the defining 
relation. It also holds for simple random variables Z = b n 1 a„ , A n e ZY , since 
expectation is a linear operator. 

The random variables E[XZ \ ZY] and ZE[X \ ZY] in the second equality of (2.47) 
are ^"-measurable. If Z = 1 a, A e ZY , (2.47) holds a.s., because for A e ZY the 
left and the right sides of this equation are f A E[X l a | ( .Y] dP — j A X I 4 dP = 
J AnA X dP and 5a 1 aE[X \&]dP = f AnA E[X \&]dP = J AnA X dP, respec- 
tively, by the defining relation. This equality also holds for simple random variables 
Z by the linearity of conditional expectation. 

The extension of (2.47) to an arbitrary random variable Z results from the repre- 
sentation of Z by a difference of two positive random variables, which can be defined 
as limits of simple random variables ([4], Sect. 9.1). A 

Corollary 2.1 The conditional expectation E[X \ Z] is the projection of X on ZY 
and X — E[X | Z] is orthogonal to ZY . Moreover, E\X \ZY\ represents the best mean 
square (m.s.) estimator for X given the information content of Z . 

Proof That X — E [ X \ Z] is orthogonal to Z follows from the first equality in (2.47). 
Since E[X \ Z] is the orthogonal projection of X on {Z : Z e Z}, it is the best m.s. 
estimator for X given ZY ([ 8 ], Sects. 4.3 and 4.4). A 
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Example 2.55 The conditional expectation E[X \ cr(Z)] = E[X \ Z] is the best 
m.s. estimator of X with respect to the information content of ct(Z), where X, Z e 
L 2 {£2 , , P ). The best m.s. linear estimator of X is 


X = E[X ] + 


E[XZ] - E[X]E[Z] 
E[Z 2 ] - E[Z] 2 


(Z — E[Z]) . 


(2.48) 


The estimator represents the linear regression of X with respect to Z, and becomes 
X — E[X] if X and Z are uncorrelated. O 

Proof The function X = aZ + b is cr(Z)-measurable for any constants a, b. It is 
the conditional expectation of X with respect to er(Z) if E[X\a\ = E[(aZ + b) l/i] 
for all A e a(Z) by (2.46), which gives Zt[X] = £[X] = aE[Z] + b for A — £2. 
The orthogonality condition in (2.47) implies E[(X — X)Z] = 0 or E[XZ\ = 
aE[Z 2 ] + bE[Z ]. The solutions a, b of these equations introduce in X = aZ + b 
give the expression of X in (2.48). The resulting estimator has the property that its 
m.s. error E[ ( X — X) 2 ] is smaller than the error E[{aZ + b — X) 2 ] for all the other 
values of a and b. ▲ 


The following three theorems give properties of the conditional expectations that 
are useful for calculations. The properties listed below are similar to those of expecta- 
tion and hold a.s. As indicated at the beginning of this section, we consider real- valued 
random variables defined on the same probability space (J2 , & , P) and a sub-a-held 
oi&. 

Theorem 2.14 IfX and X n are integrable random variables, then ([11], Sect. 10.3) 


X e Z? implies E[X \ Sf] = X, 

E[a X l +bX 2 \&] = a E[X i \ &] + b E[X 2 \ Sf] 
X[ < X 2 implies E[X i | < E[X 2 | (#] 

\E[X\<f[\<E[\X\ |Sf] 

X n f (4)Z implies E[X n \ fl\ f (Di^X | (#] 

\X„\ < Y a.s., E[Y ] < oo a.s., and X n -» X imply 


(linearity) , 
(monotonicity), 

(modulus inequality), 
(monotone convergence) , 
E[X n | Sf] -* E[X | Sf] 


(dominated convergence). 

(2.49) 


Theorem 2.15 The Cauchy-Schwarz and Jensen inequalities are, respectively, ([4], 
Sect. 9.1) 


{E[XY | Sf]) 2 < E[X 2 | &]E[Y 2 | and (2.50) 

g{E[X | <$]) < £[g(X) | ^], where —> M is a convex function. (2.51) 

Theorem 2.16 If ( JJ\ and ZJ 2 are sub-a -fields of 5? such that Zt\ C ZJ 2 , then ([4], 
Sect. 9.1) 
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E[X | Sfi] = E[X | Sfc] 


£[X | Sf 2 ] e and 


(2.52) 


E{E[X | Sf 2 ] | Sft} = E[X | Sft] = E{E[X | f#, ] | Sf 2 ). (2.53) 


Definition 2.37 Let (42, JF, P) be a probability space and Sf a sub-a-beld of ■'A . 
The conditional probability with respect to Sf is 


P(A | Sf) = £[1 A I Sf], 


(2.54) 


The definition is meaningful since 1 4 is ^-measurable. The random variable 
P(A | C S) is integrable, (^-measurable, and satishes f A P(A \&)dP = P(A fl A) 
for all A e ^ . The latter equality holds since J A P(A \ &)dP = f A E[\a \&]dP = 
f A 1 1 dP by the debiting relation in (2.46). 

Example 2.56 Let A and B be events on a probability space (42. & , P) such that 
P(B) > 0 and P{B C ) > 0, and consider the sub-a-beld Sf = {0, 42, B, B c } of 
The conditional probability in (2.54) is 


P(A | Sf) = E[\ a I Sf] = £[1 A I fi]l B + E[\ a I fi f ]l B . 



(2.55) 


where that latter equality holds by Example 2.53. This shows that P(A | Sf) is 
a random variable taking the values P(A fl fi)/fi(fi) and P(A (T B C )/P{B C ) with 
probabilities P{B ) and P(B C ), respectively. Note that P( A \ f /l) extends the dehnition 
of the conditional probability in (2.5).0 

Example 2.57 Let X > 0 a. s. be a random variable debned on a probability 
space (42, & , P ) and Sf a sub-a-beld of ffi . If £[X] < 00 , the set function 
Q{A) = E[X\a\, A e Sf, is a bnite measure on the measurable space (42, Sf). 
The conditional expectation of X with respect to Sf is the Radon-Nikodym derivative 
E[X | Sf] = dQ/dP. 

This debnition becomes E[X \ Sf] = E[X + \ Sf] — E[X~ \ (#] for random 
variables with £[1X1] < 00 and E[X | = (£[Xi | (#],..., E[Xd | Sf]) for 

-valued random variables with bnite mean. O 

Proof The debning relation for conditional expectation gives Q(A ) = f A X dP = 
f A E[X | 5f] dP, A e Sf. Since Q and P are bnite measures and Q is absolutely 
continuous with respect to P, the equality Q(A) = f A E[X \ Sf] dP . Ae^, implies 
E[X | Sf] = dQ/dP by Theorem 2.12. A 


2.12 Discrete Time Martingales 

Martingales are essentials for constructing stochastic integrals. This section provides 
a primer on discrete time martingales. An example is used to introduce an elementary 
version of stochastic integrals. 
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Definition 2.38 Let (£2 . ■ r X) be a measurable space. An increasing collection C 
C ... C ■ • • C of sub-cr -fields of & is said to be a filtration in (£2 , 5P). A 
probability space (£2 , & , P) endowed with a filtration o is called a filtered 

probability space and is denoted by (£2 , & , (.^„)«>o, P). It is assumed that J^o 
contains all the P-null sets of & . 

Example 2.58 Suppose the sequence X = (X\, AL, . . .) gives outcomes of coin 
tosses. The information content of the a-field = <r(X \ , . . . , X n ) is sufficient 
to decide whether an event related to the first n tosses has or has not occurred. For 
example, the event A = {at least 2 heads in the first five tosses} is .^5 -measurable 
because we can decide after five tosses whether A has or has not occurred. If 
{ tail, tail, head, tail} is a sample of the first four tosses, the event A remains undecided 
so that A f J^4. O 

Definition 2.39 Let (£2, .'¥) and (<P, Sf) be measurable spaces, (■ ( X n )n>o a filtra- 
tion on (£2 , and X = (Xo, X \, . . .) a sequence of measurable functions from 

{£2 , JF) to (<P, Sf). The sequence X is said to be adapted to the filtration (2^ n )n > 0 or 
£P n - adapted if X n is -measurable for each n > 0. The minimal or natural filtra- 
tion of X = (Xq, X [, . . .), that is, the smallest cr-field with respect to which X is 
adapted, is the filtration er(Xo, X \ , . . . , X n ), n > 0. 

Example 2.59 Let X = (Xo,Xi, . . .) be a real-valued sequence defined on a 
probability space (£2 , .'X , P) and set .'P,, = cr(Xo, X\, . . . , X n ). The sequences 
Y n = g(X n ) and Y n = maxo<,-<„{X, } are ^-adapted, where g : R — > R is a Borel 
measurable function. O 

Definition 2.40 Let X = (Xo, Xi, X2, . . .) be a sequence of real- valued ran- 
dom variables defined on a probability space (£2, & , P) endowed with a filtra- 
tion (J^n)„> o- The sequence Xq, Xi, Xj, ... is said to be an -martingale if (1) 
E\\X n |] < 00, n > 0, (2) X is -adapted, and (3) E[X n | jF m ] = X m for 
0 < m < n. 

If the equality in the third condition is replaced by > and <, then X is said to 
be an -submartingale and £P n -supermartingale, respectively. If the random vari- 
ables X n are in L p (£2 , & , P), X is called a p-integrable martingale, submartingale, 
or supermartingale. If p — 2, then X is said to be a square integrable martingale, 
submartingale, or supermartingale. 

Example 2.60 Let R„ = ]T" =1 X, . n > I . and Ro = 0 be a random walk, where 
{X, } are iid random variables. If the random variables X, have finite mean, R = 
(Ro, R 1, R 2 , ■ ■ .) is an & n -supermartingale, martingale, or submartingale depending 
on the sign of expectation P [ X 1 ] , where & n = cr(Xi, . . . , X„ ) , n — 1,2,..., and 
= {0, £2). O 

Proof Note that (Ro, R\ , . . . , R „ ) and (R\ R„) are ^-measurable for n > I 

so that R = (Ro, R\, R 2 , ■ ■ ■) is J?„-adapted. Also, for n > m > 0, we have 
E[R n | — R m + YJi = m + 1 £ [X;] = R m + (n - m)E[X 1] so that R is a super- 

martingale, martingale, or submartingale if E [X 1 ] is negative, zero, or positive. ▲ 
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Example 2.61 Let R — (Rq, R \ , . . .) be as in Example 2.60. If the random variables 
X i have finite variance and mean zero, the sequence S n = R 2 = - =1 X, Xj , n > 

1, with 5o = 0 is an -submartingale and S n — nE[X\] is an -martingale. <> 

Proof The sequences S„ and S„ — nE[X 2 ] have the first two defining properties for 
martingales. For the third property, note that 

E[S n+ 1 I = E[{R n + x n+x ) 2 I & n ] = E[R 2 + 2 R n X n+1 + X 2 n+l \ J?„] 

= R 2 n + 2R n E[X n+l ] + E[X 2 n+l \ = S n + E[X 2 ] > S„ 

since R n is -measurable, X n+ \ is independent of ■'¥ n , and E[X n+ \ ] = 0. Hence, 
S = (So, Si,...) is a submartingale. Since E[S n + 1 — (n + | = 

(S n + E[X\\) — (n + l)E[X\] = S n — nE[X 2 ], S„ — nE[X\\ is a martingale. A 

Following are martingale properties resulting from their definition, Doob’s decom- 
position, an elementary construction of stochastic integrals, two martingale inequal- 
ities, and a brief discussion on stopped martingales. 

LetX = (A'o, X i , . . .) be a sequence of random variables defined on a probability 
space (£2 , .'X , P) with a filtration (•5%)„ >o. Then (1) if X is a submartingale, martin- 
gale, and supermartingale, its expectation is an increasing, constant, and decreasing 
function of time, (2) X is a martingale if it is both a submartingale and a supermartin- 
gale, (3) if X is a submartingale, then — X is a supermartingale, and (4) the third defin- 
ing condition for martingales can be replaced with E[X n+ \ \ .'¥ n ] = X„, n > 0. 

Theorem2.17 LetX = (A'o, X\, . . .) be a martingale on a filtered probability space 
(P2 , fF , (i£n) n >o, P)- The series Y = (Yq, Y\, . . .) defined by Yq = X o — £[A'o] 
and Y n = X n — X n -\, n > 1, is orthogonal, that is, E[Y m Y n \ = 0 for m ^ n. 

Proof The properties of X imply that Y n is integrable, -measurable, and satisfies 
E[Y n | ,^ m ] = 0, n > m. For n > m, we have E[Y n Y m ] = E{E[Y n Y m \ & m ]} = 
E{Y m E[Y n | ,iP m ]j — 0 since Y m is -measurable and E[Y n \ ^ m \ = 0. A 

Theorem2.18 ( Doob decomposition) IfX — (A'o, Aj, . . .) is an HP n -submartingale, 
then there is an & „-martingale M n and an increasing process A n with the properties 
Aq = 0 and A n e ,.j . n > 1, such that the representation 

X n = A n + M n , n> 0, (2.56) 

holds and is unique. The representation shows that submartingales have a predictable 
part A n that can be told ahead of time and an unpredictable part M n . 

Proof We first show that (2.56) is unique provided it exists. Note that E[X n \ 
fp n - 1 ] = E[A n | & n - 1] + E[M n | fP n -\] = A n + M„_ i for n > 1. Substitut- 
ing M n - 1 in this equation with its expression from (2.56), we obtain the recurrence 
formula A n = A„_i + E[X n \ ,^ n -\] ~ X n -i, which defines A uniquely since 
Aq — 0. 

We now show that the decomposition in (2.56) exists, that is, that there are 
processes A and M with the stated properties. Let A n , n = 0, 1, . . . , be defined by 
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the above recurrence formula with Ao = 0. Note that A n e fP n -\ and A n > A„_i 
since X„ is a submartingale, so that A n , n = 0, 1, , has the stated properties. We 
also have 

E[M n | = E[X n - A n | i] = £[X„ | 

— £[A„_] + £[X„ I & n —\\ — X n - 1 | ^n-l] = — A„_1 + X„_i = M„_1 
so that M is an -martingale. A 

Example 2.62 Let X n , n > 0, be a square integrable martingale. Then X 2 is a 
submartingale that admits the Doob decomposition in (2.56) with M„ — X 2 — A n 
and A„ = Z"=i E[xf - X 2 _ x \ <> 

Proof The process X 2 satisfies the first two defining conditions for martingales, and 

E [x 2 I &n-i\ = E [(X„ - Z„_!) 2 + 2X„X„_, - X 2 _ x I &n-l] 

= E [(X„ - Z„_i) 2 I J^_i] + X 2 _ x > X 2 _ x 

since £ [(X„ — X„_i) 2 \ & n -\\ > 0, X„_i is 3? n -\ -measurable, and X „ is a 
martingale. Hence, X 2 is a submartingale, so that (2.56) holds with {A,,} given 
by A n = A n - 1 + £ [X 2 | ^„_i] - X 2 _ x , n > 1, and A 0 = 0. A 

Example 2.63 Let X n denote our fortune after n rounds of a game with unit stake. 
Suppose m > 1 rounds have been completed in this game, so that X n — X m , n > m, 
gives our net total winnings/losses in the future m + L • • • , n rounds. The best m.s. 
estimator of X n — X m , n > m, given our knowledge & m after m rounds is the 
conditional expectation £[X„ — X m \ J^ m ], where = a(Xi, . . . , X m ), m > 1, 
and &b — {0, f2}. If Xo, Xi, . . . is a martingale, then £[X„ — X m \ ,^ m \ = 0, that 
is, our average fortune £[X„ | jF m ] at time n > m is equal to our current fortune 
Xm ■ 

Suppose now that stakes A/, i = 0, 1, , other than one are allowed, where 
Ao = 0. Since stakes for round m + 1 are decided based on knowledge accu- 
mulated after m rounds, A m+ \ is & m -measurable. Processes with this property are 
said to be predictable processes. The sequence M = (Mo, M \ , M 2 , . . .) defined by 

n 

M„ = ^ A, (X; - X, _ 1 ), n = 1,2,..., (2.57) 

i=l 

with Mo = 0 and Xo = 0 gives our total fortune after n > 1 rounds and constitutes 
a discrete version of the stochastic integral studied later in the book (Chap. 4). The 
integrand {A,} is a predictable process and the integrator is defined by increments 
{X/ — X/ — 1 } of a martingale. O 

Theorem 2.19 Let X = (Xo, Xj, . . .) be a square integrable J? n -martingale and 
A = (Ao, A \ , . . .) be an 3P n -predictable process such that Ao = 0 and £[A 2 ] < 00 . 
Then M n in (2.57) is an -martingale. 
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Proof Note that E[\M n \] < £"=i £[|A ; ||X ; - - < Z"-i (£[A 2 ]£[(X; - 

X,_i) 2 ]) 1,/ ‘ by the Cauchy-Schwarz inequality, so that E[\M n \\ < oo since A; and 
Xj have finite second moments. That M n e follows from its definition and the 
properties of A„ and X n . For n > m, we have 

n 

E[M n | & m \ = E[M m + Y ~ x i-i) I &m] 

i=m -\- 1 
n 

= M m + ^ £[A,(X, - X/-!) | 
z=m+l 
n 

= M m + Y E{E[A i (X i -X i -i)\& i - 1 ]\& m } = M m 

i=m + 1 

since A, e and X ; is a martingale so that £[A,(X,- — X;_i) | J^-i] = 

A ( £[X, - X,-_, |^-_ 1 ]=0. A 

Definition 2.41 An {0, 1, . . .}-valued random variable T defined on a filtered prob- 
ability space (12, fp , (■'X n )n> I , P) is a stopping time with respect to lX n , n = 
0, 1, . . . , or an ^-stopping time if { T < n ) e fX n for each n > 0. 

Stopping times are useful for both applications and theoretical considerations. For 
example, suppose [X„] is the state of a physical system that performs according to 
specifications as long as its state does not exceed a critical value x cr . The failure time 
T = min[n : X n > x cr ] is a stopping time. Stopping times are also useful tools for 
constructing stochastic integrals (Sect. 4.4.3). Useful information on stopping times 
can be found in [1] (Sect. 2.2), [7] (Sect. 2. 16), [10] (Sect. 1.5), and [13] (Sect. 2.2). 

Theorem 2.20 T is a stopping time if and only if {T = n) e for all n > 0. 

Proof If T is a stopping time, then [T < n} e JX n and { T < n — 1} C e ■'Xn - 1 c 
so that {T = n] = {T <n}C\{T < n — l] e e lX n . If {T — n] e JX „ for each n > 0, 
then {T — m] e fX m C & „ for m < n and {T <n} — U" !=0 {r = m } e A 

Definition 2.42 Let X — (X^, X be an & n -submartingale, martingale, or 
supermartingale and let T denote an jX n -stopping time. Then Xj t (a>) — X nA j ( m )(a>), 
n — 0, 1, . . . , is called the sequence X stopped at T. Note that the samples of [Xj } 
are constant at times exceeding T. 

Theorem 2.21 If T is an lX n -stopping time and X n is an ,X n -submartingale, 
martingale, and supermartingale so is Xj t . 

Proof Since 

E[\X nAT \] = Yf \X k \dP+ f 

k=0 J{T=k) J{T>n) 


\X n \dP, 
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£[|x„|] < 00 , f [T=k] \X k \dP < E[\X k \], and f [T>nj \X n \dP < E[\X n \], X T n is 

integrable. X nA j is & n -measurable for all n > 0 since X nA j = x kUT = 

k) + X n l(T > n ), X k e ^ for k < n, 1(T = k) e c and 
1(7 > n) e ffi n . The representation = Xit=o X k l(T — k) + X n l(T > n) 

implies X ( „ + i )Ar - X, !A7 - = (Z , I+1 - X„)l(7 > n) so that E[X {n+l)AT - X„ Ar | 
J£„] = 1(7 > n)E[X „+ 1 — X n | J?„] since 1(7 > n ) is -measurable. If X is 
a submartingale, martingale, or supermartingale, E[X n +\ — A",, | .‘7,,] is positive, 
zero, or negative, that is, X„ is a submartingale, martingale, or supermartingale, 
respectively. A 

Theorem 2.22 (Optional stopping theorem) If (l) X is an ,f n -martingale, (2) 7 is 
a stopping time with respect to fp n such that 7 < oo a.s., (3) Xj is integrable, and 
(4) E[X n 1(7 > n )] — y 0 as n — > oo, then E[Xj ] = p, where p — E[X n \. 

Proof Since Xj = X nA j + (Xj — X n )\(T > n) and X is a martingale, we 
have E[Xf] = p + 7[Xj'l(7 > n)] — £[Z„1(7 > nf\. The expectation 
E[X n \{T > n)] converges to zero as n —> oo by hypothesis. The expectation 
E[XtI(T > «)] = zLa~=;;+i E[X k l(T = A:)] also converges to zero as n -* oo 
since \E[Xj\{T > «)] < 7 [ | A" 7 - 1 ] and Xj is integrable by assumption so that the 
series E[X k 1 (7 = 7)] is convergent. Hence, the expectation of Xj is p. A 

We conclude with two inequalities that are useful in applications. The proof of 
these and other inequalities can be found in, for example, [5] (Chap 24) and [11] 
(Chap. 10). 

Theorem 2.23 (Doob maximal inequality) If X = (An, X \ , . . .) is a positive 
ffin -submartingale and X > 0 is an arbitrary constant, then 

P ( max X k > X | < —E X n l( max X k > X |1. (2.58) 

\ 0<k<n ) X \ 0 <k<n J J 

Theorem 2.24 (Doob maximal L 2 inequality) If X = (A'q, X \, . . .) is a square 
integrable positive fP n -submartingale, then 


E 



<4 E[X 2 n ], 


(2.59) 


2.13 Monte Carlo Simulation 

Let X be an R^-valued random variable with distribution F that is defined on a 
probability space (f2, fp , P). Our objectives are to generate independent samples of 
X and estimate properties of X from its samples. 
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2.13.1 Gaussian Variables 

Let X ~ N ( /x . y) be an R^- valued Gaussian variable with mean // and covariance 
matrix y. A useful representation of X is provided by the Cholesky decomposition 
showing that 


X = /tt + /3G ~ N(fi, y). 


(2.60) 


where G is an Revalued random variable with independent N{ 0,1) coordinates and 
/I is a lower triangular matrix whose non-zero entries are 


Yij-'LtzU 


(2.61) 


with the convention Xr=t PirPjr = 0. 

Samples of X can be calculated from (2.60) in which G is replaced with sam- 
ples of this vector generated by, for example, the MATLAB function randn. Alter- 
natively, algorithms using memoryless transformations of some random variables 
can be used to generate independent samples of N(0, 1) variables. For example, 
Z\ = ~J—2 ln(t/j) cos(27r{/2) and Zi = y/—2 ln(C/i) sin(27r{/2) are independent 
iV(0,l) variables, where U \ , OR ~ U (0, I ) are independent random variables distrib- 
uted uniformly in (0,1) [3, 6, 12]. 


2.13.2 Non-Gaussian Variables 

Let X be a real-valued, non-Gaussian random variable with distribution F that is 
continuous. Samples of X can be calculated from samples of t/(0,l) by the following 
transformation. 

Theorem 2.25 IfX is a real-valued random variable with continuous distribution 
F, then 


X = F~\U( 0,1)). (2.62) 

Proof Since F , (F’- 1 (f/( 0, 1)) < z) = P(U( 0, 1) < F(z)) = F(z), (2.62) holds, 
so that samples of X can be calculated from samples of t/(0,l) and the representa- 
tion of X in (2.62). For example, n independent samples of an exponential random 
variable with mean 1/A., k > 0, can be obtained from — ln(l — rand (n, I ))//, = 
— ln(rand(n, 1))/A., where rand is a MATLAB function generating samples of 
L(0,1). ▲ 
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Suppose now that X is an K 0 ' -valued random variable with continuous distribution 
F. Let F\ and Fk\k-\ 1 , k = 2, .... d denote the distributions of the coordinate 
of X and of the conditional random variable X ^ | (Xk-\ , . . . , X \), respectively. 

Theorem 2.26 Let Z = {Z \ , . . . , Z,i) be an -valued random variable defined by 


Fi(Zi) = [/i, 

Fk\k-i l (Zk I Zk~ i, . . . , Z i) — Uk, k = 2 , . . . , d, 


(2.63) 


where {Uk, k = 1 , ,d] are independent U(0,1) random variables. Then, X and Z 
have the same distribution. 

Proof That P(Z\ < zi) = If (z i ) follows from Theorem 2.25. Since Zi \ Z\ = 
F 2 -J(U 2 | Zi), we have 

P(Z 2 <z 2 \Zi= zi) = P{F-{(U 2 | zi) < z 2 ) 

= P{u 2 < F 2 \\ (Z 2 I Zl)) = P 2 |l(Z 2 I Zl), 

and so on. ▲ 

Let {u\,...,ud) be a sample of {U\, . . . ,Ud). The corresponding sample 
(zi, . . . , id) of Z can be calculated from (2.63) sequentially beginning with zi = 
F-\ Ul ) and continuing with Fk\k-\ i (z* I zt-i, . . . , zi) = m* for increasing 
values of k > 2. 

Example 2.64 Let X — (X \ , X 2 ) be a non-Gaussian vector with X \ ~ /V (ji, a 2 ) 
and X 2 | (2fj = xi) ~ N(xi , /l 2 ). The density of X is 

The mapping in (2.63) becomes Z\ = p, + and Z 2 \ (Z\ = z\) = 

zi + P0~ 1 (U 2 ), where U\ and U 2 are independent copies of U{ 0,1). O 

Example 2.65 Let X e be a translation vector, that is, X — g(Y), where Y is an 
-valued Gaussian vector with mean zero, covariance matrix p = {p,- j = E[YiYj]} 
such that pi i = 1, i = 1, . . . , d, and g : is Borel measurable. Samples 

of X can be generated from samples of Y and the definition of X or from samples 
of U( 0, 1), the distribution of X, and the algorithm in (2.63). The latter approach 
is less efficient for translation random vectors. If the mapping Y X is given by 
Xj — FT (0 ( Y t )j = gi(Yj), i = \ , .... d, where F, are continuous distributions, 
then the coordinates of X have the distributions F, . O 

Proof Let y , = gY l (xi) and y = (yi yd). The distribution. 


P(Xi < x\, . . . , X d < x d ) = P(Y i <y\, ■ ■ ■ ,Y d <yd) = ®d(y; P), 
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called multivariate translation distribution, can be used as input to the Monte Carlo 
simulation algorithm based on (2.63), where 0j(-', p) denotes the joint distribution 
function of the Gaussian vector Y ~ N( 0, p). If X; — F~ l (<£(F,)), then P(X, < 
Xi ) = P(Yi <0 ' ( /•’; (xP))) = F, (xi ) , 1 = 1, . .., d. Details on translation random 
variables can be found in [6] (Sect. 3.1.1). ▲ 


2.13.3 Estimators 


Let X be a real-valued random variable with distribution F and h : R -> R be a 
measurable function. Our objective is to estimate the expectation E[h(X)] from n 
independent samples of X. The expectation E[h(X)] is of interest in applications 
since it provides useful information on X. For example, if h( X) = X 1 ' and /■ > 1 is 
an integer, then E[h(X)] is the moment of order r of X. If h{X) — 1 (X > a), a e R, 
then E[h(X)] = £[1(X > £/)] = P(X > a). 

Theorem 2.27 Let h : K — >■ R be a measurable function and let X ,X n be n 
independent copies ofX such that E[h(X) 2 \ < oo. The estimator, 

1 " 

Y= Yh(Xi), (2.64) 

n r - ' 


is unbiased, that is, iitT] = E[Y], and Var[T] — > 0 as n oo, where Y = h(X). 

Proof We have £[F] = (l/«)X!=i E[h(X { )] = E[Y] since X , have the same 
distribution. Also, 


E[Y 2 ] = 


;a nE[h(X i) 2 ] + (n 2 - n)(E[h(X i)]) 2 


so that Var[T] = Var[T]/n = Var[/;(A i )]/ n. The coefficient of variation of estima- 
tor Y is cov[T] = (Var[F]) 1 ^ 2 /£ , [T] = cov[h(X\)]/ ^/n. k 

The estimator Y is guaranteed to approximate E\ Y] = E[h( X)] accurately for 
a sufficiently large n. Yet, the required sample size n may be so large that the use 
of Y becomes impractical. For example, suppose our objective is to estimate the 
probability P(X > a), X ~ N(0, 1), that is, the expectation E \ \(X > a)]. The 
mean and variance of Y are P(X > a) and P(X > a)P(X < a)/n, respectively, 
sothatcov[T] = P{X < a)/(nP(X > a)). Fora = 5 we have E[Y ] — £[1(Y > 
a)] = 0(-5) = 2.8665 x 10“ 7 , Var[T] ~ &(-5)/n, andcov[T] ~ l/Vntf>(-5). 
To have a coefficient of variation of 0. 1 we need at least 10 5 samples. A much larger 
sample size would be needed for a threshold a > 5. 

More efficient Monte Carlo algorithms, referred here to as improved Monte Carlo 
algorithms, can be constructed by measure change. Let P and Q be two probabilities 
on a measurable space (12, &) such that P XE Q , that is, P is absolutely continuous 
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with respect to Q (Definition 2.29). Then, there exists a positive measurable func- 
tion g = clP /clQ : (17, &) — > ([0, oo), &([0, oo))), called the Radon-Nikodym 
derivative, such that P(A) = f A g(a))Q(d(t>), A e & (Sect. 2.8). Following are 
examples illustrating the construction of Monte Carlo simulation algorithms based 
on the Radon-Nikodym derivative. 

Example 2.66 Let X = X/L/fc a k 1 A k be a simple random variable defined on a 
probability space (17, JF, P). where {A k e EF , k — 1, . . . , m) is a measurable 
partition of 17 such that P(A k ) > 0 and | < oo, k = 1, . . . , m. Let h : R —> R 
be a measurable function. The expectation of h(X) with respect to the probability 
measure P is Ep[h(X)] = h(a k )P(A k ), forexample, £/[/;(X)] = P(X > a) 

for h(x) = l(x > a). 

Consider another probability measure Q on the measurable space (17, j?) such 
that Q(A k ) > 0, k = 1 , . . . , m . We have 


E P [h(X)] = Y,h(a k )P(A k ) = Y, 
k=\ k = 1 


h(a k ) 


p(A k y 

Q(A k )_ 


Q(A k ), 


(2.65) 


that is, Ep[h(X)] can be calculated as the expectation of random variable X = 
Xi-Li h( a k)(P (A k ) / Q{A k ))\ A k with respect to the probability measure Q. <> 

Example 2.67 Suppose our objective is to estimate the probability P(X > a) by 
Monte Carlo simulation, where X is a real-valued random variable defined on a 
probability space (17, & , P). Let 


l n | n 

PMC (a) = - y. 1 (xt > a) and piMc(a ) = - V l(z; > a) 

n *« J n * J 


f(Zi) 
qizi ) ’ 


be estimates of P(X > a) by direct and improved Monte Carlo simulation, where 
Xj and n are independent samples generated from the densities/and q , respectively, 
where /(f) = dP(X < %)/di~, q(^) = dQ(X < f)/df , and Q is a measure on 
(17, such that P Q. 

IfX = exp(T), Y ~ N( 1, (0.2) 2 ), then P(X > x) is 0.3712, 0.0211, 0.1603 x 
10 -3 , 0.3293 x 10 -4 , and 0.7061 x 10 -5 for a = 3, 5, 8, 9, and 10, respectively. 
The corresponding estimates puc(x) based on 10000 samples are 0.3766, 0.0235, 
0.2 x 10 -3 , 0, andO. The estimates pi mc( x ) based on the same number of samples are 
0.3733, 0.0212,0. 1668 x 10“ 3 , 0.3304x 10“ 4 , and 0.7084 x 10“ 5 fora = 3, 5, 8, 9, 
and 10, where q{z) = 4>{{z — a)/a)/a is the density of a Gaussian variable with 
mean a and variance a 2 . While the estimators p j mc (a ) are satisfactory up to a — 10, 
the estimates pMc(a) are inaccurate for a > 8. <> 

Proof The required probability is P(X > a) = J R l(f > a) f(f ) dq = E p[\(X > 
a)], where Ep denotes the expectation operator under P. We also have 


P(X > a) = 


1 (I >a) 




= e q 


1(X > a) 


nxy 

q(X)_ 
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The density q has been selected such that 50% of its samples exceed a and the ratio 
f/q is bounded in R. A 

Example 2.68 Suppose our objective is to estimate the probability P(X > a), where 
X ~ N( 0, 1) . The coefficient of variation of the Monte Carlo estimator Tmc in (2.64) 
with/ifX) = 1 ( X > a) is 2.30/y/n, 6.55/ y/n, and \%10/ y/n for a = 1, 2, and 5, 
respectively. Consider also an improved estimator for P(X > a) defined by 





where 0(n) = exp(—u 2 /2)/y/2jt, <p*(u ) = exp(— (n — a) 2 / (2a 2 )) / {y/2n a) , a > 
0, and {Xj } denote independent copies of X. The coefficient of variation of Timc is 
approximately 9 1.97/,/n, 91.2\/yfn, and 61.96/y/n for a= 1,2, and 5 if a — 0.1 
and 1.21 /y/n, \.59/y/n, and 2.41 /y/n for a = 1, 2, and 5 if a = 1.0. Note that Tmc 
deteriorates rapidly as a increases and that the efficiency of Timc depends strongly 
on the selection of sampling distribution </>* . O 

Example 2.68 shows that the efficiency of improved Monte Carlo simulation 
depends essentially on the measure proposed for calculations. Since the coefficient 
of variation of Timc is not available analytically, the selection of an optimal density 
(/>*(■) requires extensive calculations. The coefficients of variation reported for Timc 
have been estimated from Monte Carlo experiments performed for various a. 

There is no efficient procedure for selecting measures yielding accurate and effi- 
cient estimators even for one-dimensional problems, as considered in Example 2.68. 
The construction of improved estimators encounters additional difficulties when 
dealing with multidimensional problems. For example, consider the estimation of 
the expectation EflfX e £>)], where X is an Revalued random variable and I) is a 
subset of K 1 . The selection of a new measure for X such that its samples under this 
measure fall in equal proportion in D and D c is a rather complex task. The following 
example presents a multidimensional problem for which an improved Monte Carlo 
algorithm can be constructed simply. 

Example 2.69 Let p s = P(X e D) and Pf = P(X e D c ), where D = {x e 

: || a- || < r} is a sphere of radius r > 0 centered at the origin of and X is 
an Revalued random variable with independent N( 0, 1) coordinates defined on a 
probability space (Q, P). The probability p f can be calculated from 



where Q is a measure on (Q , .%) such that P Q. The densities / and q of the 
distributions induced by probability measures P and Q are 
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fix) = (2 jt) d/2 exp -- 
q(z) = [(2;r)cr 2 r d/2 exp 


1=1 

1 

2a 2 


and 


(z l - r ) 2 



Monte Carlo estimators based on the first and the second expressions of p f in (2.66) 
are denoted by p/,mc and Pf,iMC , respectively. The exact probability p f can be 
calculated from 


and is equal to 0.0053, 0.8414 x 10 -4 , and 0.4073 x 10 -6 for r = 5,6, and 7, where 
r(-) denotes the gamma function. Monte Carlo estimates Pf,MC °f Pf based on 
10000 independent samples of X are 0.0053, 0.0, 0.0 for r = 5, 6, and 7. Improved 
Monte Carlo estimates p fjMC of P f based on the same number of samples for r — 5 
are 0.0, 0.0009, 0.0053, and 0.0050 if a = 0.5, 1 .0, 2.0, and 3.0. For r — 6, the esti- 
mates p/.iMC are 0.0001 x 1CT 4 , 0.1028 xl(T 4 , 0.5697 x 10“ 4 , 1.1580 xl0“ 4 , 
and 1.1350 x 10 -4 if a = 0.5, 1.0, 2.0, 3.0, and 4.0. For r = 7, the estimates p fjMC 
are 0.0, 0.0016 x 10“ 6 , 0.1223 x 10“ 6 , 0.6035 x 10“ 6 , and 0.4042 x 10“ 6 if 
a = 0.5, 1.0, 2.0, 3.0, and 4.0. Note that the density q corresponds to an R r/ - valued 
variable with independent Gaussian coordinates with variance a 2 and mean 0, except 
for a coordinate that has mean r. Monte Carlo estimates are unsatisfactory for rela- 
tively large values of r. The accuracy of the estimators pfjMC depends essentially 
on q, for example, they are unsatisfactory for a — 0.5 and accurate for a in the range 
[3, 4]. O 


2.14 Exercises 

Exercise 2.1 Show that 0 ( 3 /) defined by (2.1) is a a -field, and that a (.s/ ) is the 
smallest er-field including s4 . 

Exercise 2.2 Prove the properties of the probability measure P in (2.2). 

Exercise 2.3 Show that the inclusion-exclusion formula in (2.3) is valid. 

Hint: Use the fourth formula in (2.2) to calculate the probability of UTLj Aj by viewing 
this event as the union of 1 A, and A m for m < n. 

Exercise 2.4 Show that the conditional probability P (A \ B) in (2.5) is a probability 
measure on (£?, ■'A , P). 

Exercise 2.5 Prove the law of total probability and the Bayes formula in (2.6). 
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Exercise 2.6 Consider the events A i = {(6, 2)}, Ai = {(6, 2), (4, 4), (1, 6)}, and 
B = {to = (i, j) e E2 : i + j = 8} in the experiment of rolling two dice. Calculate 
the conditional probabilities P(A\ \ B) and P ( A 2 \ B ) by using the definition in 
(2.5) and by direct arguments. 

Exercise 2.7 Let Xbe a random element. Show that the a(X) in Definition 2.13 is 
a a -field and that this field is the smallest with respect to which X is measurable. 

Exercise 2.8 Suppose a random variable X defined on a probability space (42 , & , P) 
takes a finite number of values a\, ... ,a n e R. Construct the cr -field a (X) generated 
by this variable. 

Exercise 2.9 Prove Fatou’s lemma for sequences of events, that is, show 
P(lim inf A n ) < lim inf P(A n ) < lim sup P(A n ) < P(lim sup A n \ 

«— >oo n > oo >oo n — >-oo 

where {A n } are events on a probability space (42, .(R, P). 

Hint Note that POiminf^oo A n ) = PClim^oo C\k>nAk) — lim^oo P(r\k> n Ak), 
where the latter equality holds by Theorem 2.6 since (T k> n Ak is an increasing 
sequence of events. Since P(JAk>n^k) < P(A„), we have P(liminf„_ ! . 00 A n ) < 
lim inf, 1 _ i , 0 o P(A n ). Similar arguments can be used to show limsup,,^^ P(A„ ) < 
P(lim sup,,^^ A n ). The inequality lim inf,,^^, P(A„) < lim sup,,^^ P(A„) is 
valid since { P(,4„j) is a numerical sequence. 

Exercise 2.10 Consider two -valued random variables X and Y defined on a prob- 
ability space (42, & , P). Show that P({X <x] n {Y < y}) = P(X < x)P{Y < y) 
implies the independence of X and Y, where the notation X < x means (X \ < 
x\,...,X d < Xd). 

Exercise 2.11 Show that X defined by (2.15) is an R^ -value random variable and 
that the collection of simple random variables constitutes a vector space. 

Exercise 2.12 Prove Jensen’s inequality in (2.18) for finite-valued simple random 
variables. 

Hint Use the following fact. If g : R — > R. is convex, then g is continuous and 
g(x) = sup{/(x) : /(«) < g(u), Vm e K}, where 1 denotes a linear function. 

Exercise 2.13 Prove Fatou’s lemma given by (2.28). 

Exercise 2.14 Show that X < Y a.s. and £[1X1], £[|F|] < oo imply E[X\a\ < 
E[Y l A ], A e JR 

Exercise 2.15 Show that the expectation of a positive random variable X is given 
by E[X] = / [Q oo) P(X > x)dx. 

Hint The mapping (x, o>) \-x 1 ( X (a>) > x) is measurable from ([0, oo) x 
42, ^([0, oo)) x to ({0, 1}, JT), where X = {0, {0}, {1}, {0, 1}}. Fubini’s the- 

orem and the equality Jj Q ^ 1 (X(co) > x)dx = J| 0 X{m)) dx = X(a>) give 
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1 (X(co) > x) dx P(dco) = 



X(co)P(doj) = E[X}. 


Exercise 2.16 Show that the correlation and covariance matrices of a random vector 
with finite variance are positive definite. 

Exercise 2.17 Prove the properties of the distribution function stated following 
Definition 2.32. 

Hint Set B n = {w : X(co) < x n }, and B = {a> : X(co) < x), where {x,,} is a 
decreasing numerical series converging to A. The sequence of events B n is decreasing 
so that lim^oo B n = (T,“ x B n = B implying F(x n ) = lim n ^oo P{B n ) = 

P dim^oo B n ) = P(B) = F(x). 

Since F is a bounded, increasing, and right continuous function, it can only have 
jump discontinuities. Recall that F has a jump discontinuity at c if the left and right 
limits of F at c are finite but not equal. To show that F has at most a countable 
number of jump discontinuities, consider two distinct jump points £ < §' of F and 
the open intervals 7^ = (F(|— ), F(^)) and I ^ = (F(t -'— ), F ( £')) associated with 
these jumps. Since £ ^ §', there exists f e ($,$') such that F(^) < F(|) < 
F(£;'—), showing that 7j and are disjoint intervals. The collection of intervals 
/j is countable since each 7j contains a rational number and the set of rational 
number is countable. The sum of all jumps of F is / [f(S+) - F (§-)] = 
) — F(i)—)] < 1, where J denotes the collection of jump points of F. 
Hence, en E < 1 so that n E < 1/e, where n E denotes the number of jumps of F larger 
than e > 0. 

Exercise 2.18 Show that the central moments £[(27 — /i) q ] of X ~ N (ji, cr 2 ) are 
zero if q is odd and equal to q\o q /(2 q ' 2 (q/2)l) if q is even. 

Exercise 2.19 Let X ~ N (0, 1) and set Y — X 2 . Find the covariance matrix of 
(X,Y). Are X and Y correlated? Are X and Y independent? 

Exercise 2.20 Find the characteristic function of A — a + bN . where a, b e R 
are constants and A is a Poisson random variable with intensity X > 0, that is, an 
{0, 1, . . .{-valued variable with probability P(N = n) — X n e~ x /n\, n > 0. 

Exercise 2.21 Let X and Y be random variables defined on the same probability 
space. Show that ( 1 ) if X and Y are independent, they are uncorrelated, (2) uncorrelated 
random variables can be dependent, and (3) uncorrelated Gaussian variables are 
independent. 


Exercise 2.22 Calculate the expectation of random variable X 2 \ (X 1 = z) with 
density in (2.38). 
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Exercise 2.23 Show that the characteristic function cp of a real-valued random vari- 
able X is positive definite. 

Hint The function <p : R. -» C is positive definite if the matrix {cp(uk — uf), k,l = 
1, . . . , n] is positive semi-definite for all n > 1 andn* e R. Note that 0 < E[ZZ *] = 
Y!k,l= l ZkZ*(p(uk - ui) for Z = XiUi z k ex P ( iu kX) and zk e C arbitrary. 

Exercise 2.24 Find the expression of the characteristic function for X ~ y) 
given by (2.43). 

Exercise 2.25 Calculate the moment generating function m(u ) = E[exp(uX)], 
uel, for X ~ N(n, a 2 ). 

Exercise 2.26 Find the properties of the conditional Gaussian vector in (2.44). 

Exercise2.27 Let X (a>) = 2+sin(2jro)) be a random variable defined on a probabil- 
ity space (52 = [0, 1], A? = A@[0, 1], P(dco) = dm) and let A\ = [0, 1/4), A 2 = 
[1/4, 3/4), A?, = [3/4, 1), and A 4 = {1} be a measurable partition of ,52. Calcu- 
late the conditional expectation E[X | 0], where 0 = a (A,, i — 1, . . . , 4). Plot 
E[X | 1#] and £[Z] against co e £2 — [0, 1]. 

Exercise 2.28 Prove Theorem 2.15. 

Exercise 2.29 Prove the relationships (2.52) and (2.53) in Theorem 2.16. 

Hint The defining relation gives f A X dP = f A E[X \ 0| ] dP for all A e 0| , so that 
f A E[X | 0i ]dP — f A E[X | 0 2 ] dP under the assumption E[X | 0i] = E[X | 0 2 ], 
We have f A X dP = j A E[X \ & 2 \dP for all A e 0i so that E[X | 0 2 ] is 0i- 
measurable. Conversely, if E[X \ 0 2 ] is 0i -measurable, then E{E[X | 0 2 ] | 0i} = 
E[X | 0 2 ]. The proof is completed by using (2.53). 

Note that f A E{E[X\ 0 2 ] | 0 , ) dP = f A E[X \ 0 2 ] dP = f A X dP = f A E[X \ 
0) ] dP holds for all A e 0) C 0 2 by the defining relation, which gives the first 
equality in (2.53). The second equality in this formula results since E[X | 0)] is 
01 -measurable so that 0 2 -measurable, which gives E{E[X | 0’i] | 0 2 } = E[X | 0i]. 

Exercise 2.30 Prove (2.55) by noting that 0 is generated by the partition {B, B'\ 
of 52. 

Exercise 2.31 Show that if X and Z are random variables on the same probability 
space, then E[X \ Z] = E[X \ a (Z)], where E[X \ Z] — f ufx\z(u I z)dn and 
fx\z is the density of X conditional on Z. 

Exercise 2.32 Let X be a real-valued random variable defined on a probability 
space (52, AP , P) and 0 be a sub-cr -field of AP . Show that E[X | 0] = £[X] 
for 0 = [0, 52} and E[X | 0] = X for0 = JL 

Hint If 0 = AP , then E[X | 0] is ^-measurable and J A (X — E[X | :AZ]) dP — 0 
for all A e AF by the defining relation, so that X — E[X \ AX) a.s. 

Exercise 2.33 Let X be a random variable with distribution F and a e R such that 
F(a) e (0, 1). Show that 
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x dF(x) x dF 0) 
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F(a) 


1 - F(a) 


where <£ = {0, f2, A, A c ] and A — X [ ((— oo, a]). 

Exercise 2.34 Suppose A, and X, in (2.57) have finite variance. Calculate the mean 
and variance of M n in this equation. 

Exercise 2.35 Calculate the mean and the coefficient of variation of the estimator 
Yimc in Example 2.68. 
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Chapter 3 

Random Functions 


3.1 Introduction 

We have seen that random vectors are finite families of real- or complex-valued 
random variables that are completely characterized by their joint distribution. Ran- 
dom functions can be viewed as countable or uncountable families [X^ , £ e /} of 
real- or complex-valued random variables or vectors, where I is an index set. 

Definition 3.1 Let (£2, & , P) be a probability space, / a countable or uncountable 
index set, and X : I x £2 -+ K' 3 ' a function of two arguments tel and co e £2. 
We say that X is an W 1 -valued random function if X (t, ■) is an -valued random 
variable for each t e I. If I is a countable set representing time, then X is said to be 
a time series. If I is an uncountable set, then X is a stochastic or random process and 
a random field if / is a subset of M representing time and a subset of , d! > I , 
representing space, respectively. 

The definition extends directly to complex-valued random functions. It is common 
to use the abbreviated notation X(t) for random function X, rather than X (t, co) which 
indicates both arguments of X. The functions X(-,co) indexed by co e £2 are the 
samples or the sample paths of X. 

Recall that a random element on a probability space (£2, .9 , P) is a measurable 
function X : (£2, .9) — > ( S , ,9) , where S is a metric space and .9' denotes the 
cr -field generated by the open sets of S (Definition 2.1 1). If S is a space of function 
depending on temporal and spatial arguments, X is a stochastic process and random 
field, respectively. For example, X (co) is a real-valued continuous function defined 
on [0, 1 ] if ^ = C[0, 1]. We denote by X(t, co) the coordinate t e [0, 1] of X(co), that 
is, the value of continuous function X (co) at t. This interpretation specifies the sample 
properties of X in contrast to Definition 3.1 that does not provide this information. 

Example 3.1 Let X(t), t e [0, 1], be a real-valued stochastic process with n < 
oo continuous samples {xk(t)}, k = 1 ,...,«, that is defined on a probabil- 
ity space (£2, £9 , P), so that {.r^} are members of C[0, 1]. If X : (£2 , ££) -v 
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Fig. 3.1 Samples of {X, } for p = 0.7 {left panel) and p = 0.95 ( right panel) 


(C[0, 1], *^[0, 1]) is measurable, the sets A * = X - ' k = 1 pro- 

vide a measurable partition of 12 , so that X (t , w) = X!a-=i x k(t)\ A k ((>)) . By analogy 
with random variables, X is called a simple stochastic process. Generally, a random 
function has an uncountable number of samples. O 

Definition 3.2 An -valued random function \X(t),t e / }, / e M. rJ , on a prob- 
ability space (12, & , P) is called measurable if A :/ x 12 -> is a measurable 
from (/ x 12, 38(1) x J?) to (K rf , 3S(R d )). 

If A is a measurable random function, then A (f, •) is ^"-measurable and A(-, o>) 
is .:;#(/ )-meas Li table by Fubini’s theorem (Theorem 2.9). The first statement is valid 
even if A is not measurable by the definition of random functions. There are random 
functions that do not have measurable versions ([1], Example 9.4.3). However, most 
of the random functions used in applications are measurable, for example, Brownian 
motion is a measurable process (Theorem 3.36). 

This introductory section is concluded with examples of time series, stochastic 
processes, and random fields. We also introduce the Brownian motion and the com- 
pound Poisson processes and show typical samples of these processes. 

Example 3.2 Let {A/}, i e / = {1, 2, ...}, be a time series defined by the recurrence 
formula A, + i = pX ; + yj 1 — p 2 Wj , i = 1,2,.. ., where \p\ < 1 and {W,} are 
uncorrelated random variables with mean 0 and variance 1. If initial state A i has 
mean 0 and variance 1 and is uncorrelated of { W , } , then L[X, X,+f] = p k . Figure 3.1 
shows five samples of {A,} for iid Wj ~ /V (0, 1), A i ~ N( 0, 1) independent of 
{Wi}, p = 0.7 (left panel) and p = 0.95 (right panel). The samples of {A,} for 
p = 0.95 are smoother than those for p = 0.7 since they have a stronger temporal 
dependence. O 

Proof The expectation of A,- + i = pX, + y / 1 — p 2 Wj is£[A, + i] = pE\Xf\, i e /, 
so that £[A ; ] =0, i > 2, since E[X \] = 0 by assumption. The expecta- 
tion of the square of this equation is yi+ 1 = P 2 Yi + (1 — p 2 ), where /,■ = 
E[X 2 ] so that y st = lim/^oo y,- = 1 since \p\ < 1. Under the assumption on 
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Fig. 3.2 Samples of X(t) = 
Ycos(2trf), Y ~ {7(0, 1) 




Fig. 3.3 Samples of X(t\, 1 2 ) for 7, Z ~ {/ (0, 1) 


Zi, y ( - = 1 at all times. We also have E\XiXi+\\ = E[X,(pXi + y/ 1 — p 2 W/)] = 
pE[Xf], E[XiX i+ 2 ] = E[[Xi( P (j>Xi + y/l -p 2 Wi) + y/l- p 2 W i+l )] = p 2 
E\Xf], and so on for all i e I since X \ and { IV, | are uncorrelated by assump- 
tion. A 

Example 3.3 The function X(t) = Y cos(2nt ), t e [0, 1], Y ~ U (0, 1), is a real- 
valued stochastic process since it is a random variable for each t e I = [0,1]. 
Figure 3.2 shows five independent samples of X. We say that X(t) is a parametric 
process since it is represented by a deterministic function of time depending on a 
finite number of random variables. O 

Example 3.4 Let Y, Z ~ U( 0, 1) be independent random variables. The function 
X(t\, ti) = Y cos(vifi) cos(v 2 f + 2ttZ), {t\, h) e [0, l] 2 , is a real-valued random 
field, referred to parametric random field since X is a deterministic function depend- 
ing on a finite number of random variables. Figure 3.3 shows two samples of this 
parametric random field. <> 
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Fig. 3.4 Samples of a 
Brownian motion and a 
compound Poisson process 
with Y\ ~ N( 0. a 2 ) and 
/ter 2 = 1 



Definition 3.3 A real-valued process B(t), t > 0, is called a Brownian motion 
if (1) B(t) = 0 a.s., that is, P{{a> e £2 : B(0,a>) = 0}) = 1, (2) for any 
0 < s < t, B(t) — B(s ) ~ N( 0, t — s), (3) the process has independent increments, 
that is, for any integer n > 1 and 0 < < tj < ■ ■ ■ < t n , the random variables 

B{t\), B(t 2 ) — B(t\ ), . . ., B{t n )— B(t n - 1 ) are independent, and (4) almost all samples 
of B are continuous functions, that is, P ( { o> e Q : B(-, co) is continuous}) = 1. 

Example 3.5 Let C(t) = Xr-V Yk, t > 0, be a compound Poisson process, 
where A is a homogenous Poisson counting process with intensity parameter X > 0 
and {Lt}, k = 1 , 2 ,..., are iid real-valued random variables with finite variance. 
Figure 3.4 shows a sample of C for Y\ ~ N( 0, a 2 ) and ka 2 — 1 and a sample of a 
Brownian process B(t), t > 0. The sample of C is piecewise constant with jumps 
[Yk] at the jump times of N(t) while that of B is continuous. <> 


3.2 Finite Dimensional Distributions 


Consider an M rf -valued random function X(t), t e I (Z . If / is uncountable, it is 
not possible to specify the joint distribution of all random variables defining X. In 
this case, X can be characterized partially by its finite dimensional distributions. 

Definition 3.4 Let X(t), tel, be an -valued random function defined on a 
probability space (L?, ■'¥ , P ), n > 1 an integer, and e /, k n, arbitrary 

distinct arguments. The finite dimensional distribution of order n of X is the joint 
distribution of the random vector (X (t\), . . ., X (t n )), that is, 

F n (x (1 \ . . ;X (n) - 1 u ■ • ;t n ) = P(n n k=l {X(t k ) < r«}) 


(3.1) 
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where X(tk) < x® means A,-(ft) < x^ k) for all i = 1, . . ,,d. The distribution 
F(x; t ) = F\ (x ; t ) is referred to as the marginal distribution of X at t, and represents 
the distribution of the R^-valucd random variable X(t). 

The definition is meaningful since x^ =1 (— oo, x®] are Borel sets so that { A(t>) < 
e As previously, (— oo,x®] means x^ =1 (— oo, x®]. Note that F n are 
probability measures induced by X on the measurable space (K n<? , 3§ nd ). 

Definition 3.5 Two Revalued random functions X(t) and Y(t), t e I C 'SJ 1 ' are 
said to be versions if their finite dimensional distributions coincide. Versions may or 
may not be defined on the same probability space. 

Finite dimensional distributions provide a very useful but incomplete characteri- 
zation for random functions, as illustrated by the following two examples. 

Example 3.6 Let X(t) and Y (t), 0 < t < 1 , be real- valued random functions on a 
probability space (£2 = [0, 1], fp = &[0, 1], P(dco) = dm) such that X(t,co) = 0 
and Y(t, co) = I (t = u>) . These processes have the same finite dimensional distri- 
butions but sup rs [ 0 jj X(t) — 0 and sup fS [ 0 i] Y{t) = 1. Also, X(t ) has continuous 
samples with probability 1 while almost all samples of Y (r) are not continuous, that 
is, P(X(t) continuous in [0, 1]) = 1 and P(Y(t) continuous in [0, 1]) = 0. <> 

Example 3.7 Let X(t), t e I = [0, 1], be a real-valued stochastic process with 
known finite dimensional distributions. This information may be insufficient for 
calculating statistics of some functionals of X. For example, the probability of the 
event {|X(f)| < a,0 < t < 1} = Ho<f<i{| X(t)\ < «}, a > 0, depends on an 
uncountable number of random variables so that its probability cannot be obtained 
from the finite dimensional distributions of X. Moreover, no< f <i{|V(r)| < a } may 
not be measurable since it is the intersection of an uncountable number of events. O 

Definition 3.6 Let A be a real- valued random function with finite dimensional distri- 
butions in (3. 1 ). The corresponding finite dimensional densities of A can be calculated 
from 


fn (x 


CD 


M. 


\ t\i • • tn ) — 


d n F n (x m ,...,x (n) -,t u ...,t n ) 
3xF)- ■ -3x1") 


(3.2) 


provided they exist. The marginal density of A at t is /(x; t) = /i(x; t). These 
definitions extend directly to vector- and complex-valued random functions. 

Given a random function, its finite dimensional distributions of any order are well 
defined. However, the converse may be true, that is, it may not be possible to construct 
a random function whose finite dimensional distributions match an arbitrary family 
of probability measures. 

Example 3.8 Suppose we attempt to construct a real-valued process X(t ), t e [0, 1], 
with finite dimensional distributions F n {x\, . . ., x„; ti , . . ., t n ) = nLi <p(xk), 
n > 1, and continuous samples. The construction is not possible since (1) the 
sequence of events A n = {co : X (t, co) > s, X (t + 1 /n, co) < — e} converges to the 
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empty set as n —> oo by the required continuity of the samples of A" so that P(A n ) -» 
P(0) = 0as« — > ooforeverye > 0and(2) P(X(t) > s,X(s) < —e) = (0(— e)) 2 
for any t ^ s and e > 0 by the definition of F„ so that P(A n ) = -/> 0 as 

n -> oo, which leads to a contradiction. O 

Definition 3.7 A family of probability measures P t] tn {A \ x ■ ■ ■ x A„), A* e 

£$(R d ), n — 1,2,..., cl > I . satisfies the Kolmogorov consistency criteria if 

P t] ,...j n (A\ x •••xA„) = (^(A^ijX.-.xA^)), and 

^fi, (-At x- • 'XA„ xE^F,! f „(Aix---xA„), (3.3) 

where {jt (1) , . . ., 7r(«)} is an arbitrary permutation of {1, . . ., n}. and O 

If a family of probability measures satisfies the Kolmogorov consistency crite- 
ria, there exists a random function whose finite dimensional distributions match this 
family ([2], Sect. 1.5, [3], Theorem 1.1.16). This result is particularly useful in appli- 
cations since it is common to postulate a family of probability measures, infer its 
unspecified parameters from the available information, and assume that the resulting 
measures are the finite dimensional distributions of a random function. 

Definition 3.8 A random function X(t) is said to be stationary if its finite dimensional 
distribution are invariant under argument translation, that is, 

(X(*i), . . ., X(f n )) = (X(fi + r), . . ., X(t n + r)), (3.4) 

or equivalently, / 7 „(a' (I) , . . ., x^; t \, . . ., t n ) = F n {x ^ , . . ., x^; ti + r, . . ., t n + r), 
for arbitrary n > 1, t\, . . ., t n e , and r e . If X is a random field satisfying 
(3.4), we say that X is a stationary or homogeneous random field. 

Example 3.9 Let (X 1 .X 2 , . . .) be a time series, where Xj, i = 1,2,..., are 
iid random variables. The series is stationary since the vectors (Xi, . . ., X n ) and 
( X \ +m , . . ., X n+m ) have the same distribution for any integers m . n > 1. <> 


3.3 Sample Properties 


Definition 3.9 An R^-valued random function X(t), t e /, is (1) continuous in 
probability at t e I if lim|| J _q|_ s .o ^ > (ll^(> s ) — AT (r) || > e) = 0 for all e > 0, (2) 

continuous in thep’th mean at tiflim|| s _ r ||^o £[||^(‘5) — ^( f )ll / ’] = 0, and(3)almost 

surely (a.s.) continuous at r if P ({&) : lim|| s _ r ||_ > o ||X(s, a>) — X(t, a))\\ =0}) = 1. 
If p = 2 in (2), we say that X is m.s. continuous at t. 

A random function X is continuous in probability, continuous in the p ’th mean, 
and almost surely continuous in I if it is continuous in probability, continuous in the 
p ’th mean, and almost surely continuous at each re/, respectively. 
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Note that almost sure continuity at each time t does not imply sample continuity. 
If X is a.s. continuous at t > 0, then P2 t = {w e £2 : lim v ^ ( m) ^ X(t, co)} 
has probability zero. However, the probability of U r >o Z2, may not be zero or may 
not even be defined since this set is an uncountable union of measurable sets, which 
is not necessarily measurable. 

Example 3.10 Let (f2 = [0, 1], & — 3$\f), 1], P(dco) — dm) be a probability space 
and let Xbe a real-valued stochastic process on this space defined by X(t, co) = 1 (t > 
co), t e [0, 1]. Since P2 t = {r}, P(H2,) = 0 so that X is continuous a.s. at each 
t e [0, 1]. However, the probability of U /S [o,i]f2, = [0, 1] = is equal to 1 so that 
the process is not sample continuous. O 

Example 3.11 Let C(t) and Bit) denote the compound Poisson and the Brownian 
motion processes in Example 3.5, and assume Y\ e L 2 (£2 , AX , P). The Brownian 
motion and the compound Poisson processes are continuous in probability, m.s. 
continuous, and a.s continuous at each t > 0. The Brownian motion has continuous 
samples while the compound Poisson process does not. O 

Proof Since P(\B(t) — B(s)\ > e) — 2<P{— e/^/\t — s|) -> 0 as |,y — t\ —> 0 for 
any e > 0, the Brownian motion process is continuous in probability at t. Since the 
event (|C(f) — C(i)| > e} has non-zero probability if C has at least a jump in the time 
interval (s, f], the probability that Chas at least ajump in (.v, t\ is P(N(t — ,y) > 0) = 
1 — exp[— X(t — s)], and P(N(t — s) > 0) — > Oasis — 1\ — > 0, the compound Poisson 
process is continuous in probability at t > 0. The processes B and C are continuous 
in probability at each time t > 0, so that they are continuous in probability in [0, oo). 

Since the expectations E[(B(t) — Z?(s)) 2 ] = t — s and E[(C{t) — C(s)) 2 ] = 
X(t — s)E[Y 2 ] + (X(t — s)£[Ti]) 2 ([4], Sect. 3.3) converge to 0 as |s — t\ — >■ 0, the 
Brownian motion and the compound Poisson processes are m.s. continuous at t. 

Since B(t) — Bit — I / n) is Gaussian variable with mean zero and variance \/n 
for any integer n > 1, the events A n (e) — l\B(t) — B{t — l/n)| > e}, n = 1,2, 
are such that 

P(liminf A„(e)) < liminf P(A n (e)) = lim inf 2(1 — <P(*/ne)) = 0, e > 0, 

ft— > 0 O ft— > 0 O ft — > oo 

by Fatou’s lemma (Theorem 2.11), so that B is continuous a.s. at t. For C, set A„ = 
(|C(f) — C(t — \/n)\ >0}, n = 1,2,..., and note that P(A n ) = 1 — e~ x ^‘ — > 0 
as n — > oo. Since P(liminf„ A n ) < liminf,, P(A n ), the probability of Q, = \o> e 
H 2 : lim,-^, C(s, w) ^ C(f, of)} is 0, so that C is a.s. continuous at t. 

The Brownian motion process has continuous samples since E[(B(t + h) — 
B(t)) 4 ] = 3 h 2 for any t > 0 and h > 0. So it satisfies the Kolmogorov criterium 
stated later in this section (Theorem 3.1) with a = 4. c > 3, and j3 — 1. It can be 
seen from Fig. 3.4 that C is not sample continuous. A 

We have seen in Examples 3.6 and 3.7 that finite dimensional distributions provide 
insufficient information for calculating probabilities of various functionals of random 
functions. Two options are available to overcome this difficulty. The first is to assume 
that a random function X(t), tel, satisfies certain regularity conditions so that its 
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samples can be characterized completely by their values on a countable set /* C / 
that is dense in I. Under this assumption, distributions of functionals of X(t ) can be 
obtained from values of X(t) at t e I* , so that they can be calculated from the finite 
dimensional distributions of X(t), for example, the distribution of sup ?g/ X (t ) can be 
derived from that of sup fg/ * X (t). 

The second option is to consider explicit analytical expressions for a random 
function, for example, X(t) = h(t, Z), tel, where Z is an Revalued random 
variable and h is a measurable function. These representations, referred to as para- 
metric random functions, are used extensively in applications, for example, trun- 
cated Karhunen-Loeve expansions (Sect. 3.6.5) and discrete spectral representations 
(Sect. 3.8). 

Following is a brief discussion on sample properties for real-valued random func- 
tions that includes the Kolmogorov continuity criterium. The reader is referred to 
[2] (Chap. 3), [5] (Sects. 3.2 and 3.3), and [1] (Chaps. 9-12) for a comprehensive 
discussion on sample properties of random functions. 

Definition 3.10 Let X(t) and Y (t). tel C , be random functions defined on 
the same probability space. They are said to be indistinguishable if their samples 
coincide a.s., that is, the subset Qo of Q on which the samples of X and Y differ 
has probability 0. The random functions X and Y are said to be modifications if 
P({co : X(t, to) = Y(t, &>)}) = 1 at each t e I. 

In contrast to versions that may be defined on distinct probability spaces, indis- 
tinguishable processes and modifications must be defined on the same probability 
space. Also, indistinguishable processes are modifications but the converse is not 
generally true. For example, although 12, = {to : X (f, to) = Y(t, to)} is measurable 
and P(I2 t ) = 0 at each t if A and Y are modifications, the set U /e /f2, may not even 
be in 

It is common to replace a random function X by, for example, a modification of it 
whose samples have more desirable properties. For example. Brownian motion has a 
modification B that has continuous samples a.s. ([6], Theorem 26). This modification 
is said to be the Brownian motion process, and is used exclusively in our discussion. 

Definition 3.11 A real-valued random function X(t), t e I C , on a probability 
space (12, JF, P) is said to be separable if there is a countable subset I* C I and 
an event L?o with P(I2q) = 0 such that for any closed subset IS of R the events 
[to : X{t,u>) e B,t e /*} and [to : X(t, to) e B,t e 1} differ by a measurable 
subset of f2(). The set I* is called a separant for X. 

We note that (1) the finite dimensional distributions of separable random functions 
X provide adequate information for calculating the probability of sup fe/ X (t), I c 
ES J , and other functionals of X(t) involving values of this random function over 
uncountable sets, (2) there are non-separable random functions ([7], p. 44), and (3) 
separable random functions may not have smooth samples ([1], Example 9.2.2). 

Theorem 3.1 (Kolmogorov’s continuity criterium) LetX(t), t e I, be a real-valued 
separable stochastic process and la bounded interval of the real line. If there exists 
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constants a, f, c > 0 such that 

E[\X(t + h)-X(t)\ a ] <ch l+fi , (3.5) 

then sup 5 teI \ s -t\<h |2f (s) — X (t)\ 0 as h — »■ 0, that is, every sample ofX is a 

uniformly continuous function on / ([7], Propositions 4.1 and 4.2). 

Following are facts related to separable processes that facilitate the use of 
Kolmogorov’s continuity criterium. 

Theorem 3.2 LetX(t), tel. he a stochastic process defined on a probability space 
(£2, fiP , P). There exists a process X(t), tel, defined on the same space that is 
separable and P{X{t) — X(t)) = 1 for each t e I ([7], Proposition 2.1). 

Theorem 3.3 LetX(t), t e I, be a separable process that is continuous in probability. 
Every countable set I* dense in I is a separant ([7], Proposition 2.2). 

Theorem 3.4 Let X(t), tel. be a stochastic process defined on a probability space 
(£2 , ffi , P ) that is continuous in probability. There exists a separable and measurable 
process X(t), tel, defined on the same space such that P(X (f) = X(t)) = 1, and 
any countable set I* dense in I is a separant for X(t), tel ([7], Proposition 2.3). 

Example 3.12 Let X(t) be a separable, real-valued stationary Gaussian process with 
mean 0 and covariance function c(r) = E[X(t + r)X(t)] — exp(— A.|r|), X > 0. 
The process has uniformly continuous samples since the condition in (3.5) holds 
with a = 4, c = 12X 2 , and f — 1. O 

Proof Since X(t + h) — X(t) ~ N(0, 2(1 — exp( — A. |/z |))) and 1 — exp(— | m|) < 
\u\, we have 1 — exp ( — A. | /? | ) < X\h\ implying E[(X(t + h) — X(t)) 4 ] = 12(1 — 
exp(— X\h\)) 2 < \2X 2 \h\ 2 . ▲ 


3.4 Second Moment Properties 


The random functions in this section are assumed to have finite variance, so that they 
are members of L 2 (T2, 3 ? , P ). 

Definition 3.12 Let p.(t) = E[X(t)], r(s,t ) = £ , [X(s)X(f)'], and c(s, t) = 
E[(X (.v) — fi(s))( X (t) — p.(t))'\ denote the mean, correlation, and covariance func- 
tions of an R^-valued random function X(t), tel C , defined on a probability 
space (£2, 3? , P). The pairs (//, r) and (//, c), referred to as the second moment 
properties of X, provide equivalent information since c(s, t ) = r(s, t ) — p,{s)p,{t)' . 

The second moments E[Xi(t) 2 ] of the coordinate X fit ) of X(t) exist and are 
finite since X e L 2 {£2,3P, P). Similarly, the mean functions /x, (t) = E[X/ (t)] 
and the correlation and the covariance functions, rjj(s,t ) = IfiX, (s)Xj (t)] 
and Cij(s,t) = E[(Xfis) — p,j(s))(X fit) — pj(.t))], exist and are finite since 
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|£[Xi(f)]| < (£K(s) 2 ]) 1/2 and |£[A,(s)Aj(f)]| < (E[Xi(.s) 2 ]E[X j(t) 2 ]) 1 ' 2 by 
the Cauchy-Schwarz inequality (B.17). 

If A is an Cr -valued random function, its second moment properties are /i(t) = 
LfAO)] and r(s,t ) = or c(s, t) = £[(A(» — /j,(s))((X(t) — 

/x(r))*y], where z* denotes the complex conjugate of z e C. 

Definition 3.13 A random function X is said to be weakly stationary if its mean is 
constant and its correlation function depends only on the argument difference, that 
is, p,(t) = [i is constant and r(s,t) is a function of s — t rather then s and t. Note that 
if r(s,t) is a function of 5 — t so is c(s,t) since /x(r) = /x is time/space invariant. We 
use the notations r(s — t) and c(s — t) for r(s,t) and c(s,t) if X is a weakly stationary 
function. 

Definition 3.14 The coordinates A, and X: of an R^- valued random function A(f), 
tel, are said to be orthogonal and uncorrelated if rjj(s, t) = E\Xi(s)X j(t)] = 0 
andc;j(s, t) = E[(Xj(s) — p,(s))(X j(t)~ p,j(t)] = 0, respectively, for all s, tel , 
where /z,(j) = £[X,-(j)]. 

Theorem 3.5 The correlation function of an -valued random function X(f), t e 
M 0 ' , has the properties'. ( 1 ) r,- j(s, t ) = rjift, .S') , (2) |r,- j{s, t)\ 2 < rj j(s, s)r j j(t, t ), 
and (3) rjj(s,t) is positive definite. If X is weakly stationary, then nj( x) = 
r jj (— r), \rjj(r)\ 2 < rij(0)rjj(0), and rifix) is positive definite, where x = s — t. 
The covariance function ofX has similar properties. 

Proof The first property follows from the definition of r, j. The second property 
results from the Cauchy-Schwarz inequality. The third property holds since 0 < 
l akXi{t k )) 2 ] = XI,i=i akainj(t k , f/) for any integer n > 1, ajeR, and 
tk eM. d . Similar arguments yield the other properties. ▲ 

Example 3.13 Let A be an C^-valued random function in ifiiQ , fF , P ), that is, its 
coordinates are complex-valued and |ls[Ajt(s)A/(f)*]| < oo for all k, l — 1, . . ., d, 
and arguments .? and t. The correlation functions rkj(t, s) = E[Xk(t)Xi(s)*], k,I = 
1, . . ., d, satisfy the condition r^- /(r, .v) = r/jf s, t)* , where z* denotes the complex 
conjugate of z eC.O 

Proof Let Uk and V) denote the real and imaginary parts of A k ■ Then 

r k ,,(s, t) = E[{U k (s) + iV k (s))(U,(t) - iV,(t ))] 

= E[Uk(s)Ui(t) + VkisWft)] + iE[V k (s)U,(t) - U k (s)V,(t)] 

and is equal to rp k {t, s)* = (E[Xi(t)X k (s)*])* = E[Xi(t)* X k (s)] = r k ,i(s,t). 
Other properties of the correlation functions for complex-valued processes can be 
derived in a similar manner. A 

Example 3.14 Let A k , If be uncorrelated, real- valued random variables with mean 
zero and unit variance and let a k , v k > 0, k = 1, 2, . . ., n, be some constants. The 
function 
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n 

X (t) = ^ a k [A k cos (v k t) + B k sin(v^r)] , t > 0, (3.6) 

k= 1 


is a real-valued parametric stochastic process with mean zero and covariance function 


c(s, 0 = cos(v k (s - 0), 

k= 1 


(3.7) 


so that X is weakly stationary. If in addition A k , B k are Gaussian variables, then X is 
a stationary Gaussian process (Exercise 3.2). <> 

Proof The definition of X and the properties of (A k , B k ) imply E[X{t)] = 0. The 
covariance function of X is 

n 

c(s, t) = y G k a t E[{A k cos(v k s) + B k sin (v*s))(A/ cos (v/f) + B, sin(v/t))] 
k,l = 1 

/I « 

= ^ <r 2 (cos(vii) cos(v,t?) + sin(vyts) sin(v^.f)) = ^ a k cos (v k (s — t)) 
k= l *=i 


by the linearity of the expectation operator and the properties of (A k , B k ). ▲ 
Example 3.15 An alternative representation of X in Example 3.14 is 

n 

X(t)= ^ a k C k e ivkt , t > 0, (3.8) 

k=—n 

where C k = (A k — i B k )/2 for all k with A- k = A k , B- k = —B k , v- k — —v k , 
and o_ k = a k for k = I . . ... n, E[C k \ = 0, and E[C k C*\ = S k ; /2. The correlation 
function of X is c(r) = E[X(t + r)X(t)*] = X<-=-«( cr A 2 /2)e' v<:T . <> 

Proof The identities e ±,a — cos (a) ± i sin(o!) and (3.6) give the representation of 
X in (3.8). ▲ 

Example 3.16 The second moment properties of the Brownian motion B(t) and 
compound Poisson process C(f) are £[B(t)] = 0, E[B(s)B(t)] = sAt, E[C{t)~\ = 
XtE[Yi ], and£[(C(j)-£[C(s)])(C(0-£[C(r)])] = l(jAt)£[f 1 2 ], so that B(f) 
and C(t) are equal in the second moment sense if E[Y\ ] = 0 and A.£[T 2 ] =1.0 

Proof Since B(t) ~ /V(0. t) and has independent increments, we have E[B(t)] = 0 
and, for .s < t, E[B(s)B(t)] — £[B(s)(£(t)-fi(j))] + £[fi(s) 2 ] = £[B(s) 2 ] = s. 
For the compound Poisson process, we have E[C(t )] = Yj | N(t)]} = 

£{A(f)£[Ti]} = XtE[Y\\. For 5 < t , E[C(s)C{t)} = £[C(i)(C(t) - C(s) + 
C(s))] = £[C(i)(C(t)-C(A))] + £[C( S ) 2 ] = £[C(i)]£[C(t)-C(i)]+£[C(i) 2 ], 
where the latter equality holds since the compound Poisson process has independent 
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increments. Also, E[C(s) 2 ] = £ { E [ X/ V j= i Yi Y j\ I N(s)} = £{lVCs)£[F 2 ] + 
(N(s) 2 — A(s))£|Ti] 2 ] = LsEIT 2 ] + (A.s) 2 £[Fi] 2 , so that the correlation function 
of Cis£[C(s)C(0] = (XsE[Yi])(XtE[Yi]) + k{s A t)E[Y 2 ]. ▲ 

Example 3.17 Let £ : — > R he a real-valued function that is bounded and satis- 

fies the condition f A dt;(X) > 0 on all Borel sets in BJ 1 . Then its Fourier transform 


q(t) = [ e ltv dc,(y), t e R d , 
JS. d ' 


(3.9) 


is a continuous and positive definite function. O 

Proof Since e lt v is a continuous function of t, so is q. Note that q is positive defi- 
nite if the matrix {q(tk — ?/)} is Hermitian for arbitrary tk e , k = 1 n. 

and n > 1, that is, i = i a ka*q(tk — U) > 0, ak e C. The latter sum is 

ilK ZLl a k e “ kV Z"=i a * e~ l, ‘ v dt;(v) = f Rd ’ | Yl=\ ake l, 'p\ 2 dt;(y), which is pos- 
itive by the postulated properties of £. ▲ 


The converse of the statement in Example 3.17 is given by the following 
theorem that is particularly useful in applications involving weakly stationary random 
functions. 


Theorem 3.6 (Bochner’s theorem) A continuous function q : W 1 — > C is positive 
definite if and only if it admits the representation 


q(t)= [ e l, ' k d^A), 


(3.10) 


where lei 1 * and £(A.) is a bounded, real-valued function satisfying f A d£(X) > 0 
for all measurable Borel sets in ([2], Theorem 2.1.2, [8], Sect. 7.4). 


3.5 Weakly Stationary Random Functions 

This section presents properties of weakly stationary stochastic processes and ran- 
dom fields that are relevant for applications. The class of weakly stationary random 
function has been introduced in Definition 3.13. 


3.5.1 R-Valued Stochastic Processes 

Suppose A is a real-valued weakly stationary stochastic process with mean 0 and 
correlation/covariance function r (r) = c(r) = E[X ( t + x)X (?)]. We have seen that 
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r is positive definite. If in addition r is continuous, Bochner’s theorem guarantees the 
existence of a real-valued, increasing, and bounded function S : R. — > M such that 

/ oo 

e ivT dS(v), reK. (3.11) 

-oo 

Definition 3.15 The function S in (3.1 1) is called the spectral distribution of X. If S 
is absolutely continuous with respect to the Lebesgue measure on the real line, the 
Radon-Nikodym derivative ,v(v) = dS(v)/dv exists and is called the spectral density 
of X. Since S (v) is an increasing function, ,v(v) > 0 and 

/ OO 1 f OO 

e' VT s(v)dv and s(v) = — / e~ lVT r(r)dr, (3.12) 

-OO J — oo 

that is, r(r) and ,v(v) are Fourier pairs. The one-sided spectral density g(v) = 
2s(v)l(v > 0) is frequently used in applications with v interpreted as frequency. 
The relationships between correlation and one-sided spectral density functions result 
from (3.12) and are 

poo J POO 

r{ x) = / cos(vr)g(v) dv and s(v) = — / cos(vr)r(r) dr. (3.13) 
Jo X Jo 

Note that the correlation and the spectral density functions provide equivalent 
information, s is an even function, that is, ,v(v) = $(— v), and J_ 0O s(v) dv = 
f 0 °°g(v)dv= E[X(t) 2 ] =r(0). 

Example 3.18 The spectral distribution function of the weakly stationary real-valued 
process in Example 3.14 is S(v) = a + X<;=i(' T /t 2 /2)[l(' ; > — v*) + l(v > v*)], 
where a is an arbitrary constant. This expression of S can be checked by direct 
calculations using (3.1 1). Although S is not absolutely continuous, it is common in 
applications to define its spectral densities by 

1 n n 

S(v) = - [S(v - Vjfe) + S(v + v t )] and g(v) = ^aj?8(v - v k ), (3.14) 
Z k= 1 k= 1 

where S(-) is the Dirac delta function. The representations in (3.14) show that the 
energy of X is concentrated at a finite number of frequencies. O 

Example 3.19 The processes with the following correlation and spectral density 
functions 


sin(v c r) 
r(r) = go 

T 

r( r) = o 2 e~ x \ r \ 


r( r) = a 2 e a|t| (1 + k|r|), 


g(v) = go 1(0 < v < v c ) 
2<j 2 X 


g(y) = 


g(v) = 


tx(v 2 + X 2 ) 
4 a 2 X 3 
tt(v 2 + X 2 ) 2 


and 


(3.15) 
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are referred to as band limited white noise, first order Markov, and second order 
Markov processes. In these expressions go, v c , A, a > 0 are constants. O 


3.5.2 W 1 - Valued Stochastic Processes 

Let Xbe an Revalued weakly stationary process with correlation functions r k j (r ) = 
E[X k (t + r)Xi(t)], k,l — 1 , 2, .... <7. The following theorem extends the relation- 
ships in (3.12) to pairs of distinct coordinates of X. 

Theorem 3.7 If X is an WL d -valued stochastic process with mean 0 and continuous 
correlation function r( r) = E[X(t + r )X(t)'], then 

rk,l(r) = [ e lvT dS ki i(v) = [ e ,vx s k ,i(v) dv, where 
Jr Jr 

s k j = sf k = i[St - iS 2 - (1 - i)(S k , k + %)], k £ l, (3.16) 

S kik is the spectral distribution function of X k , s k j{v) = dS k j(v) /dv provided it 
exists, and S p denotes the spectral distribution ofY p (t) = 2Ljfc=i £p,rX r (t), p = 
1, 2, where (j k = j = 1 and q =0, q ^ k, l, and ^ 2 ,k — f £>,/ = 1, and 
,q =0, q ^ k, l, for an arbitrary pair ( k , l) of coordinates ofX. 

Proof The representation of r k k follows from (3.12). The correlation functions of 
Ti (t) and Y 2 (t) are 

r yi ( r) = r k ,k( r) + r,j( r) + r kJ ( r) + n, k (r) and 
r y2 (r) =r k ' k ( r) + nj(r) + ir k j(r) - ir kk ( r), 

and have the representations r Vp ( r) = J ^ e lvx dS p {v), p = 1,2, which together 
with the previous equations and Bochner’s theorem give 

= [ e lvx dS k j(v), k^l, where 

Jr 

S k ,i — S* k = -[Si - iS 2 - (1 - i)(Sk, k + 5/,/)]. 

The spectral distributions S k ,i, k f l, are bounded complex-valued functions and 
the matrix \S k i) is Hermitian since S k j = Sf k . 

It remains to show that it is possible to define a spectral density for distinct pairs 
of coordinates of X. Consider a bounded interval [v, v + Av) and set AS k ,/(y ) = 
S k ,i(v + 2lv) — S k ,i(v). The Hermitian matrix {^^./(v)}, k, l = 1 ,...,d, has 
the property X/t/=i ZkZ* AS k ,i(v) >0, Zk e C, since the correlation function of 
Y(t) = XLitkXk(.t), z k eC,is 
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r y ( r) = / e lvr dS y (v) = ^ z k z* / e lVT dS k j(v), 

J-oo W=1 J-oo 

and 5 V is an increasing function. For distinct indices k, l e {1, . . ., d} and z q — 0 
for q / k, I the sum ;=1 z k z* AS k j(v) becomes 

AS k , k (v)\z k \ 2 + AS,j(v)\zi\ 2 + 2m(AS k j(y)zkZ* l ) > 0, or 
AS k , k (v)\zk\ 2 + AS,j(v)\zi\ 2 + 2\AS u (y)\\zk\\zf\ > 0, 

because the real part of AS k j(v)z k z* is smaller than its absolute value. The latter 
expression divided by \zi\ 2 is a polynomial of rj = |Zjt|/|z/| that has no real roots 
so that \ASkj(v)\ 2 < |Z\5’a-^.(v)||Z\5'/ i /(v)|. Hence, s k ,i(v) exists and satisfies the 
condition | s*,/ (v) | 2 < Sk,k(v)sij(v), v e R. ([8], Sect. 8.1). ▲ 

Example 3.20 Let X k (t ) = — pW k {t) + ^fpW(t), k = 1 , 2, be the coordinates 
of an Revalued process X, where p e (0, 1) and (W\ , W 2 , W) are zero-mean, 
weakly stationary, mutually uncorrelated processes. Then X is a weakly stationary 
process with spectral densities s k , k (v) = (1 — p)sw k (v) + psw(v), k = 1,2, and 
sj 2 (v) = S 2 , 1 (v) = psw(y), where sw k and denote the spectral densities of W k 
and W, respectively. O 

Proof The correlation functions ofXaxer k j(r) = (1— p)&kirw k (. r )+P r w(. r ), where 
r\v k and rw denote the correlation functions of W k and W. The spectral densities s k , k 
are the Fourier transform of the correlation functions r k ,k by Bochner’s theorem. 
The correlation functions of processes Y p with spectral distributions S p , p = 1,2, 
in (3.16) are r yi ( r) = (1 - p)(c Wl (r) + c W2 (r)) + 4 pc w (r) and r y2 ( r) = 
(1 - p){c Wl (r) + c ff2 (T)) + 2pc w {r), so that 

H(v) =^~ [ e~ ,vr r n {T)dr = (1- p)(s Wl {v) + sw 2 (v))+Aps w (v), 

2n J r 

S 2 (v) =— / e~ IVT r yi (r)dr = (1 - p) (s Wl (v) +% 2 (v)) +2ps w (v), and 
2?T Jr 

ii ,2 results from (3. 16). A 


3.5.3 R-Valued Random Fields 

Consider a real-valued random field X defined on R" with mean, correlation, and 
covariance functions p(t ) = E[X(t)\, r(s,t ) = £ , [X(j)X(f)], and c(s,t ) = 
£[(Z(5) — p(s))(X(t) — jLt(f))]. If X is weakly stationary/homogeneous, then p 
is constant and r and c depend on s — t. 

If X is weakly homogeneous and its correlation function is continuous, Bochner’s 
theorem applies and gives 


74 


3 Random Functions 


r(r) 


e ir ' v dS(v) = 


e irv s(v) dv, 


(3-17) 


where S denotes the spectral distribution of X ([2], Theorem 2. 1 .2). If S is absolutely 
continuous, then X has the spectral density s(v) = d d S(v)/9vi- • -dv d '. If X has a 
spectral density, then its correlation and spectral density functions are Fourier pairs, 
that is, 

r( r) = f e ixv s{v)dv and s(v) = 77 [ e~' T 'V(r) dr. (3.18) 

JW (2n) d jRd> 

Theorem 3.8 Let X(t), t e R d , be a real-valued homogeneous random field with 
correlation and spectral density function in (3.18). Then r( r) = r(— r) and s(v) = 
s(—v) for all x and v. 


Proof We have /-(r) = E[X(t + x)X(t)] = E[X(t)X(t — r)] = E[X(t — r)X(t)] = 
/-(— r) since X is homogeneous. Also, 


s(-v) = 


(2 Jt) d ' J^d' 

1 r 


e lT v r(x) dr 


(2jt) d ' 


' r(-r)(—dri)- ■ f-dr d ’) = j(v), 


by the change of variable r !->• — r and the property r(x) = r(— r). ▲ 

Example 3.21 Let A be a real- valued, weakly stationary random field defined on R 2 
with mean 0, variance 1, and correlation function 


r(r) = E[X(t + x )X(t)\ = exp 


1 

2 


(rf + 2pnr 2 + to) 


r e K 2 , 


(3-19) 


where \p\ < 1. Since r is continuous, it admits the representation in Bochner’s 
theorem with spectral density 


s(v) = 


27rvT~ 


: exp 


v\ - lpv\V2 + v\ 


2(1 -p 2 ) 


(3.20) 


that is shown in Fig. 3.5 for p = 0.7. O 

Example 3.22 Let X 7 be real- valued weakly stationary random fields with mean 
0 and correlation functions r/fixk) = E[Xk(tk)Xk(tk + r*)], where tk, Xk e R. and 
k = 1, . . ., d' . If { Xk) are mutually independent, then X it) = nti Xk(tk) is a 
real-valued weakly stationary random field with mean 0 and correlation function 

r{x ) = nil r k(PkM), where t = {t\ t d >) and p k ( x) = x k , k=l,...,d', are 

projection functions and r = (ti, . . ., x d fi e . O 

Arguments similar to those used for -valued stochastic processes can be 
employed to define correlation/covariance functions and spectral densities for vector- 
valued random fields. Representations as in (3.16) can be constructed for the corre- 
lation functions of the same and distinct coordinates of R d - or C'^-valued random 
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Fig. 3.5 Spectral density 
s(v ) of X with p = 0.7 


0.25 



fields. The remainder of this section considers a special class of weakl y homogeneous 
random fields, referred to as isotropic random fields. 

Definition 3.16 A real-valued random field X defined in BJ 1 is said to be weakly 
isotropic if its correlation function r(s, t) = £[X(s)X(r)] = r(||s — f||) depends 
only on the length ||s — t 1| of vector s — t. 

The correlation function of these fields is subjected to various constraints [9], of 
which some are stated by the following theorem. Note that a weakly isotropic random 
field is homogeneous, but the converse is not true. 

Theorem 3.9 The correlation function r(x) of a real-valued, weakly isotropic 
random field X(t), t e R rf , with mean 0 is such that r( r) > —r(0)/d'. 

Proof Let t\ , . . ., t^’+i be d' + 1 points in W 1 such that ||f; — tj || — a >0 for all 
i ^ j. The stated inequality follows from E\\ X(tf )\ 2 ] = (d' + l)[r(0) + 

d'r(a)i and [E \ ZfLY X(t k ) | 2 ] > 0. ▲ 

Theorem 3.10 Let X be a real-valued, weakly isotropic random field defined on 
with con tinuous correlation function admitting a spectral density j(v). Then the 
correlation and spectral density ofX satisfy relationships as in (3.18), and 5(v) is a 
function o/||v||. 

Proof Since X is an isotropic field, r(t) = r((p(t)) for all t e R rf , where 
<p(t) = (^y =1 aijtj, . . ., 2y=i a d’jtjY = at defines a rotation, so that a = 
{aij} is such that a' = a~ l and a' a = a a' = I is the identity matrix. Since 
r(t) = exp(it'v) dS(v) = f Rd > exp(iip(t)'v) dS(v) = r(np(t)) by assump- 
tion, and (p(t)'v = ( atfv — t'a'v = t'\j/(v), we have exp(f t'fi (v)) dS(v) = 
exp (it'X) dS(x[r~ l (X)), where v = 1 (A.) . This shows that the spectral density 

is invariant to rotation so that it depends only on || v|| ([2], Sect. 2.5). ▲ 
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3.6 Second Moment Calculus 

The Chebyshev and Cauchy-Schwarz inequalities and the following theorem giv- 
ing properties of mean square convergent sequences are the main tools for second 
moment calculus. 

Theorem 3.11 IfX, { X„ , n > 1}, Y, and {Y n ,n > 1} are random variables with 
finite variance such that X n X and Y n Y , then 

(1) lim E[X n ] = E[LLm. n .+ 00 X n ] = E[X], 

n—>o o 

(2) lim E[X m Y n ] = £[(l.i.m. m ^ooX m )(l.i.m. „^ooT„)] = E[XY], and 

m,n— >oo 

(3) 1 fX n 44 X and X n 44- Z, then P(X£Z) = 0, 

0 . 21 ) 

where \.i.m. n ^. 00 X n — X means E[(X n — X) 2 ] — >■ 0, n -» oo. 

Proof Since 0 < |£[X] - £[X„]| = \E[X - X n \\ < (£[(X - Z„) 2 ]) 1/2 holds 
by properties of expectation and the Cauchy-Schwarz inequality and X n 44 X by 
assumption, we have the convergence E[X n \ -> E[X] stated in (1). For (2), note 
that 


\E[X m Y n -XY]\ = \E[X m Y n - X m Y + X m Y - XY]\ 

< I E[X m (Y n - Y)]\ + | E[(X m - X)Y]\ 

< (E[X 2 n \E[(Y n - T) 2 ]) 1/2 + (E[(X m - X) 2 }E[Y 2 f 2 , 

by properties of expectation and the Cauchy-Schwarz inequality. The postulated m.s. 
convergence implies \E[X m Y n — XY]\ = \E[X m Y„\ — E[XY]\ — > 0 as m, n -» oo. 
For (3), the inequalities 0 < E[(X — Z) 2 ] = £[((X — X n ) + (X„ — Z)) 2 ] < 
2 E[(X — X n ) 2 ) + 2 E[(X n — Z) 2 ] following from (a + b) 2 < 2 a 2 + 2 b 2 and 
the Chebyshev inequality P(\X — Z| > e) < E[(X — Z) 2 ]/e 2 , e > 0, imply 
P(\X — Y\ > e) = 0, Ve > 0, that is, the uniqueness with probability 1 of m.s. 
limit. A 

The theorem shows that expectation and m.s. limit can be interchanged under some 
conditions and that m.s. limit is unique with probability one, that is, the measure of the 
subset of P2 in which m.s. limits of a sequence X„ may differ is zero. The following 
section defines m.s. continuity, differentiation, and integration, give criteria under 
which these operations can be performed, and present examples. 

3.6.1 Continuity 


Definition 3.17 A real-valued random function X(t), t e M-' 1 . with finite vari- 
ance is m.s. continuous or continuous in the mean square sense at / e if 
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l. i.m.|| s _ f ||^oXO) = X(t), that is, limn^^o E[(X(s) - X(t)) 2 ] = 0, where 
|| ■ || denotes a norm in R (/ . The random function X is m.s. continuous in I if it is 

m. s. continuous at every point in / C . An Revalued random function X is m.s. 
continuous if its coordinates are m.s. continuous. 

Theorem 3.12 A real-valued random function X(t), t e I C R rf , is m.s. continuous 
at t e I if and only if its correlation function r(u, v) = E[X (m) A(v)] is continuous 
at u = v = t, that is, lim|| i( _ ; ||,||i,_ f ||^o r(u, v) = r(t, t). 

Proof We have 

\r(u, v) - r(t, 01 = | E[X{u)X(v)] - E[X(t)X(u )] + E[X(t)X(u)] - E[X(t)X(t)] \ 

< | E[X(u)(X(v) - A(0)]| + \E[X(t)(X(u) - X (f))]| 

< ( E[X(u) 2 ]E[(X(v ) - X(0) 2 ]) 1/2 + (E[X(tf]E[(X(u) - X(t)) 2 ]) l/2 -> 0 

as || m — t \\ , || v — f || — > 0 since X is m.s. continuous at t e I by assumption. 

Conversely, if r(u,v) -* r(t,t ) as ||m — f||, ||v — t|| 0, the mean square 

difference £[(A(i) — X(t)) 2 ~\ — r(s, s) + r(t, t) — 2 r(s, t) —>■ 0 implies the m.s. 
convergence of X at t e I. ▲ 

Example 3.23 A real-valued weakly stationary random function A - is m.s. continuous 
at t if and only if lim r _i,o r(r) = r(0), r e R^ (Theorem 3.12). For example, a 
stochastic process X with correlation function r( r) = exp(— X\x |), X > 0, is m.s. 
continuous over the entire real line since r(r) is continuous at 0. 

Similarly, the weakly stationary stochastic process X(t) in Example 3.14 is m.s. 
continuous at any t e R since its covariance function c(r) = cos(vit) is 

continuous at r = 0. O 

Example 3.24 Let X(t) — X/l!!* i. ? — 0, where ;V( t) is a homogeneous Poisson 
process with intensity X > 0 and {Y, } are iid random variables with finite variance. 
The mean and covariance functions of X(t) are E\X (f )] = XtE[Y{] and c(s. t ) = 
kCsAtlElYj 2 ] ([4],Sect. 3.3). Sincec(w, v) —*■ c(t, t) asu, v — > t implying r(u, v) —>■ 
r{t, t) as u, v — > t , X is m.s. continuous in [0, oo). However, the samples of X have 
discontinuities at the jump times of N showing that m.s. continuity provides limited 
if any information on sample properties. O 


3.6.2 Differentiability 


Definition 3.18 Let X(t), t e R" , be a real-valued function with finite variance and 
<5, e R fl a vector with coordinate i e {1, . . ., d'} equal to 1 and all other coordinates 
equal to 0. The m.s. derivative of X with respect to coordinate i at t. denoted by X, (t), 
is the m.s. limit of (X(t + hSj) — X(t))/ h as h -» 0 provided it exists, li d' = 1, 
we denote A, by X, for example, the case of real-valued stochastic processes. 
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It is not possible to prove the existence of m.s. derivative by calculating the 
expectation of (X; (?) — (X(t + hSj ) — X(t))/h) 2 since the limit X, (?) is not known. 
We need to show that {(X (? + h n Si) — X(t))/h n ] is a Cauchy sequence in L 2 , so that 
it has a unique limit in this space as h n 4 , 0, n -* oo, and this limit defines X, . As 
for continuity, an -valued random function is m.s. differentiable if its coordinates 
have this property. 

Theorem 3.13 If d 2 r(u,v)/diiidvi exists and is finite at (?, ?) e R 2d , then 

Xft ), i = I d’ exists. The correlation function of Xj is £[A',(^)X/(?)] = 

3 2 r{s, t)/dsidtj ([2], Theorem 2.2.2 and [10], Sect. 4.4). IfX is weakly stationary, 
the requirements on d 2 r(u , v)/3m;3v; need to be satisfied only at the origin since 
r(u,v) depends only on u — v. 

Proof Direct calculations give E[Yj(s)Yi (?)] — > 3 2 r(s, t)/ds(dtj as h — ► 0, where 
Yj(s) = (X(s + h&i) — X(s))/ h and h > 0. A 

Example 3.25 The weakly stationary process X in Example 3.23 with exponential 
correlation is m.s. continuous. However, it is not m.s. differentiable since its corre- 
lation function is not twice differentiable at the origin. O 

Example 3.26 Let X be a real-valued m.s. differentiable stochastic process. Then, 
differentiation and expectation operators commute, that is, 

^-E[X(t)] = E[X(t)], ^-E[X{.t)X{s)] = E[X(?)X(s)] f and 

dt dt 

3 2 . . 3 2 r(t,s) 

— E[X{t)X{s)] = E[X(t)X(s)] or r ik (t,s)= R\ ■ (3.22) 

otas ' dt ds 

These results extend directly to calculate expectations of higher order derivatives for 
real-valued processes and random fields. <> 

Proof Since X exists, the sequence Y n — {X {t + h n ) — Y (t ))/ h n converges in mean 
square to X(t) as n — > oo implying lim^oo E[Y n \ = E[\.i.m. n ^. 00 Y n ] = E[Z(?)], 
where h n | 0 as n oo (Theorem 3.11). This fact and the equality E[Y n ] = 
{E[X{t + h n )] — E[X(t)])/ h t , prove the first equality in (3.22). Similar arguments 
can be used to obtain the other formulas in this equation. ▲ 

Example 3.27 If X is a m.s. differentiable, real-valued, weakly stationary process, 
the correlation function of X(t ) can be calculated from r k y (r ) = — r"(r) or 
r x x( r ) = — v 2 e IVT dS(v), where S(v) and r(r) denote the spectral distri- 
bution and the correlation function of X. These results follow from the previous 
example and Bochner’s theorem. If r(r) — (1 + A,|r|) exp(— A,|r|), X > 0, we have 
r"( r) = — 1 2 (1 — A,|r|) exp(— A,|r|) and E[X(t) 2 ] = — r"( 0) = X 2 , so thatX with 
this covariance function is m.s. differentiable. O 

Example 3.28 The correlation function of a standard Brownian motion B(t) and a 
compound Poisson process C{t) with E[Y{\ = 0 and XE[Y^\ — 1 is r(u,v) = 
E[B(u)B(v)] = E[C{u)C(y )] — u A v, so that these processes are not m.s. differ- 
entiable. Formal calculations give 3 2 r(u, v)/(3 u 3v) — 8{u — v), where 5 (-) denotes 
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the Dirac delta function, so that the one-sided spectral density of Bit) and C(t) is 
g(v) = 1 /7r, v > 0. Processes with constant spectral density and delta correlation 
functions are called white noise processes in the engineering and physics literature. 
The above calculations are meaningless since 5(f) and C(f) do not exist in the m.s. 
sense. We will consider alternative ways of defining white noise processes based on 
properties of the Brownian, compound Poisson, and other processes. O 


3.6.3 Integration 

We examine briefly integrals whose integrands or integrators are real-valued 
stochastic processes with finite variance, that is, integrals of the type f 6 h(t) dX(t) 
and f a X(t)dh(t), where h : [a, b ] — »■ R is a real-valued deterministic function 
and X denotes a real-valued stochastic process. Similar integrals can be defined for 
random fields ([2], Sect. 2.3). 

Definition 3.19 Let p n = (a = to < t\ <■••<?„ = b ) be a sequence of partitions 
of [ a,b ] with intermediate points t' k e [tk-i, tk\ and mesh A{p n ) = maxi <k< n (tk — 
tk - 1 ) — »■ 0 as n — > oo. If 


Sh.xiPn ) = Thit'k){Xitk) - X(t k -i)) and 
k= l 
n 

Sx.htPn ) = £*(**)(*«*) ~ h(t k - 1 )). (3.23) 

k=l 

are Cauchy sequences in L 2 , their m.s. limits are denoted by f h h(t)dX(t) and 

rb 

J a X (t) dh(t), and are called m.s. Riemann-Stieltjes integrals of h with respect to 
X and X with respect to h, respectively. 

Most conditions for the existence of the integrals jj’ h(t) dX(t) and f 6 X(t) dh(t) 
involve metrics quantifying the variation of h and X over a partition p n = (a = to < 
t! < ... <t n =b) of [a,b] ([5], Sect. 3.9.3. 1). 

Definition 3.20 The variations of h and X on [a,b] relative to partition p n are 
VhiPn) = ZLl \h(tk) - h(tk-i)\ and vxip n ) = ZLi ll^fe) - 1 )||, respec- 

tively, where ||Z(f)|| = (LtlXff)! 2 ]) 1 ^ denotes the norm ofX(f) in L 2 . The corre- 
sponding total variations are v/, = sup p {v/;(p)} and vx — sup^fv^fp)}, where the 
sup is over all partitions p of [a,b]. If v/, < oo and v\ < oc, h and X are said to be 
of bounded variation on [a,b]. 

Definition 3.21 Let p = (a = sq < s i < ■ • • < s m = b) and q = (a = to < t\ < 
... < t n = b) be partitions of [ a, b | , let X be a real- valued process with correlation 
function r, and let v r (p, q ) = YI=\ Z"=i kfe, b) - r(s k , f/-i) - r(s k - 1, ?/) + 
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r(sk-i, f/-i)| denote the variation of r on [a, b] x [ a,b ] relative to p and q. If 
sup /; q{v r (p, q)} < oo, thenX is of bounded variation in the weak sense on [a,b~\. 

The following theorems, stated without proof, provide useful criteria for the exis- 
tence of the integrals h(t) dX(t) and J ^ X(t) dh(t). 

Theorem3.14 IfX and Y are real-valued m.s. continuous processes and h : [a, b] — > 
R. is of bounded variation on [ a,b ], then X(t)dh(t) exists, || X(t)dh(t) || < 
max ?s[a , fe ] ||X(f)||v fc , and E[J*X(t)dt f^Z(t)dt] = f [a b]2 E[X{u)Z(v)]du dv. 
([5], Sect. 3. 9. 3. 2). 

Theorem 3.15 If h : [a,b] — > R is continuous and X is a real-valued m.s. 
differentiable process with m.s. continuous derivative X, then h(t) dX(t) = 

cb * 

J a h(t)X(t) dt and both integrals exist ([5], Sect. 3. 9. 3. 2). 

Theorem 3.16 Let h and k be real-valued functions defined on a bounded interval 
[i a,b ] of the real line, X,Y be real-valued processes, and a, f elif some constants. 
If the following integral exist, then ([5], Sect. 3. 9. 3. 3). 


h(t)dX(t) = h(t)X(t) 


X (t) dh(t) (integration by parts) 


[ah(t) + fk(t)] dX (t) — a / h(t)dX(t) + /3 / k{t)dX{t) (linearity) 


h(t)d[aX (t) + fY(t)] = a / h(t)dX(t) + /3 / h(t)dY(t) (linearity) 


hit) dX it) 


X)t) dhf) 


= / hit) db\X (t)] (expectation and integration commute) 
Ja 

r b 

= / L\X (t)]dh(t) (expectation and integration commute) . 
J a 

(3.24) 


Proof For proof, see [2] (Sect. 2.3), [11] (Sect. 5.3), [8], and [5] (Sect. 3.9.3). Note 
that the latter formula in (3.24) follows by Fubini’s theorem if X is measurable, that 
is, the function X : [a, b] x L2 — > R is measurable from ([a, b] x L2 , 3S\a, b] x 
to (M, Sfl). A 

Example 3.29 LetXbe a real-valued m.s. continuous process and/; : [0, oo) — > Mbe 
a function of bounded variation on compacts. Then Y it) = J ( f X (.v) dh(s), t > 0, is 
a m.s. differentiable process with mean £[F(f)] = /,J E[Xis)]dhis) and correlation 
function E[Y (^)F(f)] = E[X {u)X {v)]dhiu)dhiv) . The existence of Y and its 

second moment properties result from Theorem 3.14. O 
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3.6.4 Spectral Representation 

Example 3.14 shows that superpositions of a finite number of harmonics with ran- 
dom amplitudes are real-valued, weakly stationary processes under some conditions. 
The spectral representation theorems in this section show that all weakly stationary 
random functions can be represented by sums of harmonics with random amplitudes. 

Theorem 3.17 Let X be a complex-valued, weakly stationary, m.s. continuous 
process with spectral distribution S. There exists a complex-valued process Z with 
orthogonal increments, referred to as spectral process, defined up to an additive 
constant such that the m.s. integral 

/ oo 

e iv, dZ(v) (3.25) 

-OO 

exists at any time t, E[Z(v)\ = 0, £[|Z(v)| 2 ] = S^v) provided S(—o o) = 0, and 
£’[|t/Z(v)| 2 ] = dS(v). If S is absolutely continuous with respect to the Lebesgue 
measure, thendS{v) — s{v)dv so that E[\dZ(v)\ 2 \ — s{v) dv ([8], Sect. 7.5). 

Proof The integral in (3.25) exists since the integrand e lvt is continuous, the inte- 
grator Z (v) has the property 

n 

v r (p,p)= X |£[(Z( V *) - Z(v*_i))(Z(v/) - Z(v/_ x ))*]| 

k,l= 1 

n n 

= X £ [l z (Tt) - Z(v k - 1 )| 2 ] = £0S(V*) - S(v t _i)) = S(a) - S(-a), 

k = 1 k=\ 

for an arbitrary partition p = {—a — vo < vi < • • • < v n = a) of [—a, a], a > 0, 
and lim n ^. 00 [S(a) — S(— a)] = 5(oo) < oo, so that Z(v) is of bounded variation in 
the weak sense ([11], Theorem 2.29). ▲ 

Theorem 3.18 Suppose X is as in Theorem 3.17 except it is real-valued. The integral 
representation, 

POO 

X(t) — / [ cos(vt) dU (v) + sin(vf) dV (v)], (3.26) 

Jo 

exists in m.s. at any time t, where the real-valued processes U and V have orthog- 
onal increments with E[U (v)] = £[V(v)] = 0, E[dU (v) dV (v 7 )] = 0, and 
E[dU(v) 2 ] = E[dV(v) 2 ] — 2 dS(v) = 2s (v) dv for all v,V > 0, where the latter 
equality holds if S is absolutely continuous ([8], Sect. 7.6). 

Example 3.30 Let X be a complex-valued process with spectral representation in 
(3.25) and let Y (t) — Xit=i c kX(t + tf), where L e K, Ck € C, and n > 1 
denote times, complex constants, and an integer. The spectral representation of Y is 
Y ( t ) = e lvt dZ(v), where dZ{v) = h (v) dZ(v) and h(v) — X^-i is the 

gain of the linear operator defining TfrJ.O 
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Proof We have Y (t) = /^[Xjt=t c k e lvtk ~\e lv ' dZ(v) — h(v)e ,v> dZ(v), so 
that Y consists of a superposition of harmonics e lvt with amplitudes h(v) dZ(v). If 
f-oo \h(v)\ 2 dS(v ) < oo, then Lis in L 2 . A 

Example 3.31 If X is a real-valued weakly stationary process and X exists in an m.s. 
sense, then 

p OO 

X(t)= / ive iv, dZ(v ) (3.27) 


with the notation in (3.25). O 

Proof Let X (a \t) = J'“ a e ivt dZ(v) and X (a \t) = f“ a ive ivt dZ(v) for a, r > 0. 
Straightforward calculations show that + r) — X^ a \t))/r — X (a) (t)) 2 ] 

converges to 0 as r -* 0 for any a > 0. The formula in (3.27) follows by letting 
a oo in the latter expectation. A 

Theorem 3.19 Let X be an W 1 -valued, weakly stationary, and m.s. continuous 
process with spectral density functions {s^/}, k, l = 1. . ., d. The integral repre- 
sentation, 


fOO 

X(t)= / [cos(vf)<7f/(v) + sin(vr)cfV(v)] (3.28) 

Jo 

exists in m.s. at any time, where U and V are W 1 -valued processes with mean zero 
and orthogonal increments such that 

E[dU k (v)dUi(v ')] = E[dV k (v) dVfv')] = S(v - v')g kJ (v) dv and 
E[dU k (v)dVi(y')] = -E[dV k (v)dUi(v')] = S(v - v’)h kJ {v)dv, (3.29) 

where g k j(v) = s k ,i(v ) + s k j(-v) and h k ,i(v ) = -i|>*,z(v) - s*,*(-v)]. 

Proof We have 


r k .t( r) = E[X k (t + r)Xi(t)] = f e lVT s k j(v)dv 
Jr 

( Sk,i(v ) + Sk,i(-v)) cos(vr) + i(s k j(v) - s k j(-v)) sin(vr) 


dv 


by (3.16), so that r k j(j) — J 0 °° [g k .i(v) cos(vr) — h k j(v) sin(vr)] dv. The expres- 
sions of r k j yield the second moment properties of U and V in (3.29). A 

Example 3.32 Let Z be a complex- valued random field defined on K' / that has 
orthogonal increments with £'[|Z(/)|] = 0 and E[\Z(I) 2 \\ — S(I), where S is a 
measure on R. d such that S(—o o, . . — oo) = 0 and I is a Borel set in W 1 . Then 
X(t) = exp((7 , v) dZ(v) is a weakly stationary random field with £ , [|X(f)|] = 0 

and £'[|Z(t)| 2 ] = J Rd ’dS(v).0 
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Proof Consider the sequence Y n (a ) = ^” =1 g(vg)Z(/ 9 (a)), where g : R d — > C 
satisfies the condition f Rl/ / |g(v)| 2 c/5 , (t') < oo, {I q {a),q = 1, ...,«} denotes a 
partition of (—a, a] d , a > 0, such that X(I q (a)) — ► 0 as n -» oo, and v q e I q (a). 
The m.s. integral J R <;/ g(v)dZ(v) is defined as the limit of the Cauchy sequence 
{Y n (a)} in L 2 as n —>■ oo and a — > oo. The mean and variance of J Rrf / g(v) dZ(v) 
are 0 and J Rrf ' |g(v)| 2 r/S(v). The field X(t) is a special case of g(v) dZ(v). A 

Theorem 3.20 Let X be a real-valued, m.s. continuous, weakly homogeneous ran- 
dom field defined on R^ . The integral representation, 

X(t) = [ e itv dZ(v), (3.30) 

J R d ' 

exists in m.s. at any t e R^ , where Z is a complex valued random field with orthogo- 
nal increments defined on R^ such that E[dZ(vf\ = 0, E[\dZ{v)\ 2 ] = dS(v ), and 
S is defined by (3.17). The correlation function ofX is r(s, t) = J Rd / e l ^ s ~ t)v dS(v), 
([2], Theorem 2.4.1). 


3.6.5 Karhunen-Loeve Expansion 

Let X(t), tel, be a complex-valued random function with mean 0, finite variance, 
and correlation function r(s, t ) = Zs[Z(i')2i r (t)*], where I C R rf is a compact set. 
The integral equation, 

J r{s, t)(j){t) dt = X(p{s), s e I, (3.31) 

has the form T<p = k<p, where T : L 2 (l ) —> L 2 (I) is a linear operator defined 
by Tcp(s) — fj r(s, t)<p(t)dt and L 2 (I) is the Hilbert space of square integrable 
functions on/with the inner product {<p, f) = fi(t)%{t)*dt . 

Definition 3.22 A non-zero number X for which there exists a function f satisfying 
both (3.31) and the integrability condition fj \<p(t)\ 2 dt < oo is called an eigenvalue. 
The function f is the eigenfunction of X . 

Theorem 3.21 The eigenvalues X of Tip = Xfi are real-valued and the eigen- 
functions corresponding to distinct eigenvalues are orthogonal, that is, fj 4>(t)cp(t)* 
dt = 0 for any pair (<p, (p) of eigenfunctions associated with distinct eigenvalues. 

Proof This follows from Theorem B.58 since Tis a self-adjoint operator. The eigen- 
functions (f> can be scaled to have unit norm, that is, \<p{t)\ 2 dt = 1. A 

Note that (3.31) has at least an eigenvalue and an eigenfunction, the collection of 
eigenvalues of (3.31) is at most countable, and for each eigenvalue there exists at 
most a finite number of linearly independent eigenfunctions (Sect. B.4.5, [2], Sect. 3.3 
[12], Appendix 2, and [13], Sect. 6.2). 
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Theorem 3.22 If r is square integrable and continuous in I x /, then X admits the 
representation 

n 

X(t) = l.i.m. B _>. 0O y^^l /2 X k (f) k (t), t e I, where 
k= 1 

X k = X~ 1/2 J x(t)<l> k (.t)*dt, E[X k ] = 0, and E[X k Xj] = S u . (3.32) 

Moreover, the Mercer theorem holds and gives r(t, s) — Z.Zi X k (p k (t)<p k (s)* . 

Proof The mean and correlation functions of X (n \t ) = '^ k -\X l k 2 X k (j) k {t) are 
£[*<">(0] = 0 and £[A<»>(s)(A<">(0)*] = ZLt Wk(s)4> k (t)* ([2], Sect. 3.3, 
[12], Appendix 2, [13], Sect. 6.2). A 

Example 3.33 Let A be a real-valued, weakly stationary, m.s. periodic process with 
period T > 0, mean 0, and correlation function r( r) = E\X it + r)X (t)]. The 
Karhunen-Loeve expansion and the spectral representation of X coincide. O 


Proof A process X is said to be m.s. periodic with period T, if its correlation function 
is periodic with period T, that is, r(r) = Z!£-oo c k exp(ikvQZ), where vo = 2tt /T 
and c k = c- k . The spectral density of X is s(v) = ZfcL-oo c k$( v ~ kvf), and has a 
countable number of frequencies. The solution of (3.31) for d’ — 1 and I = [0, 7’] 
are X k = T c k and <p k (t ) = txp(ikvQt) / sff , k = 0, ±1, ±2. . ., so that X(t) = 
l.i.m.„_ > . 00 A('9(f), where X^(t) = V k exp(ikvot) and the random variables 

in the representation of X^(t) are given by V k = (1 /T) X(t) exp(-ikvot) dt. 
The spectral representation-based expansion of X has the same functional form 
(Example 3.15). Related results can be found in [14]. A 


Example 3.34 Let X(t), t e [— £, |], £ > 0, be areal-valued stochastic process 
with mean zero and correlation function r(t, s) = (l/4)e~ 2 “l T l, where r = t — s 
and a > 0. The Karhunen-Loeve representation for A in [— £, £] is 


A (t) = 


OO r- , 

^~ l . y/hcXk<t>k(.t) + y. X k X k (j> k (t) 


(3.33) 


where 


X k 


X k 


1 

4a(l+b 2 )’ 

1 

4a (1 + b 2 ) ’ 


<Pk(l) = 


4>k(t) = 


cos(2 ab k t) 

y/H + sin(4ab k ^)/(4ctb k ) ’ 

sin(2 ab k t) 

\J%- &in(4ab k %) / (Actbk) 


b k , bk are the solutions of b k tan(2ai;b k ) = 1, b k cot(2 a^b k ) = 1, and the series in 
(3.33) converges in mean square. The eigenvalues and eigenfunctions in the represen- 
tation of A are the solutions of the integral equation e~ 2a ^~ s ^<p(s) ds = 4 Xf(t) 
([12], Example 6-4.1, P- 99). O 
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Example 3.35 The Karhunen-Loeve representation of a real-valued process X 
defined on [0,1] with mean zero and correlation function r{t, s) — t A s is 


X(t) = l.i.m.„_ 


sin(fc ■ 

J 7 r k - 

k = o 


l/2)nt 


1/2 


Xk, l £ [0, 1], 


(3.34) 


where Xq, X \ . . . . are uncorrelated random variables with means E[Xk] = 0 and 
variances E[Xf\ = 1.0 

Proof The integral equation (3.31) becomes J Q r scp(s) ds + t J t l cp(s) ds — X<p{t), 
t e (0, 1). The first and second derivatives of this equation with respect to t are 
f* f(s) ds = Xcf>'(t ) and X<p”{t) + 4>{t ) = 0, respectively. The boundary conditions 
for the latter equation are 0(0) = 0 and <p'( 1) = 0, and result from (3.31) and its 
first derivative at t — 0 and t = 1, respectively. The solution of /,<//' (f) + fit) — 0 
with these boundary conditions gives the eigenvalues and eigenfunctions for the 
Karhunen-Loeve representation of X. 

The representation in (3.34) applies to both Brownian motion and compound 
Poisson processes since their correlation functions coincide with that ofX. This shows 
that the Karhunen-Loeve representation cannot distinguish between processes with 
the same second moment properties. A 

Example 3.36 Let W(t), t e {a. b], he a white noise defined heuristically as a 
process with mean zero and correlation function £[VT(f)Vl / (.s)] = y&(t — s ), where 
s, t e [a, b], y > 0. This definition of W is frequently used in engineering applica- 
tions [10]. The Karhunen-Loeve representation of IT is Wit) = ^fy Wk<pk(t), 

where W& are random variables with E[Wk\ = 0 and E[Wk W/] = 8 k\ and [fk] is 
an arbitrary family of orthonormal functions spanning L 2 [a, b ]. The representation 
follows from (3.31), that takes the form yfit) = X(p(t) for white noise processes. 
The calculations in this example are formal since W does not exists in the second 
moment sense. O 


3.7 Classes of Stochastic Processes 

We have seen that the concept of stationarity need not be specialized for stochastic 
processes and random fields. Similarly, it is no need to provide distinct definitions for 
Gaussian, translation, and ergodic stochastic processes and random fields. However, 
Markov and independent increment random functions in time and space differ sig- 
nificantly. This section does not discuss Markov random fields. Useful information 
on this topic can be found in [2] (Chap. 8). 
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3.7.1 Gaussian Random Functions 


Definition 3.23 An Revalued random function X(t), t e W 1 , is said to be Gaussian 
if its finite dimensional distributions are Gaussian. 

Note that weakly stationarity Gaussian functions are stationary and linear trans- 
formations of Gaussian functions are Gaussian (Exercise 3.11). Also, the processes 
Z and (JJ, V ) in the spectral representations given by (3.25) and (3.26) and the random 
variables Xk of the Karhunen-Loeve representation in (3.32) are Gaussian. 

Example 3.37 Let X(t), tel, be a Gaussian function with mean 0, where I is a 
closed set in R rf . The random variables {Xk} in the Karhunen-Loeve expansion of 
X(t ) given by (3.32) are independent Gaussian variables with means E[Xk\ = 0 and 
covariances E[XkXj\ = Ski- O 

Proof The random variables Xk in (3.32) are stochastic integrals defined as m.s. lim- 
its of the sequences Sx.h ( Pn) in (3.23). These sequences are Gaussian variables as lin- 
ear transformations of Gaussian variables, that is, values of X(t) at discrete times, with 
characteristic functions VA* *(/>„)(«) = exp(— u 2 X r ai[Sx,h(Pn)]/2). Since exp (if) is 
Lipschitz, that is, there is a constant ot > 0 such that | exp(if') — exp(/f")| < 
a|£' — £"|, we have 


\ e n>Sx,h(Pn) _ e i«/ 7 X(t)0t(O*dt| < a\u\\Sx,h(Pn)) ~ J X (f)<M0* dt\ 
so that exp (iuSx,h(Pn)) —■ * ex P {' u fj X(t)(pk(t)*dt) in m.s., and 

E^exp^iu J X(O0i(O*dt^J = lirn^ E[exp(iuSx,h(Pn))] 


= lim exp(—u z Wai[Sx.h(Pn)]/7) 


= ex P^ 


« z Var 


X(t)Mt)*dt\/2 


by Theorem 3.11, so that Xk is a Gaussian variable. ▲ 


3.7.2 Translation Random Functions 


Definition 3.24 Let G(t), t e W 1 , be an Revalued stationary Gaussian function 
and let /?,■ : W 1 -* R, i = l,...,d, be measurable functions. The Revalued 
random function X(t) with coordinates X, (t) = hi(G(t)), t e R d , is referred to as 
translation random function. 

It is common in applications to define the coordinates of X(t) by X,(t) = 
hj(Gi(t)), i = 1 ,...,d, where hi : R -> R are continuous functions, so that 
Xj(t) are memoryless transformations of the coordinates { G, (f)} of G(t). 
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Theorem 3.23 IfX(t), t e , is an -valued translation random function defined 
by Xft ) = hi(Gj(t)), i = 1 where hi = Ff l o $ : B -» R, {F,} 

denote distributions with finite variances, 0 is the distribution of N(0, 1), and the 
coordinates of G{t), t e R rf , are real-valued Gaussian functions with mean 0 and 
variance 1, then 

(1) If G is stationary so is X, 

(2) P(Xi(t) < Xi ) = Fiixi), i= 1, . . .,d, 

(3) E[Xj{t)] = I hi(u)f{u) du, 

Jr 

(4) £[Z, (i)X, (/)] = / lu(u)h j(v)<p(s, t; pjj(u,v)) du dv 

Jr 2 

= E[hi{Gi{s))hj{Gj{tm 

(5) | bj(s, 01 < I Pij(.s, 01, t e M rf ', and 

(6) Given(Fif), •))> 3 G; such that X,(t) = hi(Gi(t)) if and only if 

Pij(r, ■) satisfying (4) are correlation functions, i, j = 1, . . d, 

(3.35) 

where (pf) is the density of N(0, 1), </>(•,•; p) is the joint density of a standard bivari- 
ate Gaussian vector with correlation coefficient p, %ij(s, t) = (E\Xi (s)Xj (0] — 
E[X i (s)]E[Xj(t)])/(Y^[X i (s)]Wtir[Xj(.t)]) m and Var[X,(0] = E[Xi{s) 2 ] - 
E[Xfs)] 2 . 

Proof Since P(X,-(fi) < x\ < x„) = F(G, (0) < yi, . . Gt(t n ) < 

y n ), yi = 0~ l o F,(xi ), then X is stationary if G has this property. The mar- 
ginal distribution of Xj is P(Xj(u) < $) = P(G,(u ) < 0~ l o F,(f)) = F[ (f ). 
Formulas in (3) and (4) result from the definitions of X and of the joint distribu- 
tion of (Gi(s), Gj(ff). The inequality in (5) follows from a maximal property of 
bivariate Gaussian distributions [15]. For (6), first note that pair) = 0; 1 implies 
(t) = 0; 1 . The correlation coefficient £* = £/,(r) corresponding to pair) = — 1, 
can be calculated from 


* EjhfiGfihfi-Gi)]- EjhfGj )] 2 
^ E[hi(Gi) 2 ] - E[hi(Gi)] 2 ’ J 

and has the property f * > — 1 . The value f * = — 1 can be reached if, for example, 
hi is an odd function, in which case E[hj(Gi)] = 0 and F[/z,(G,)/j,(— G; )] = 
— E[hi(Gi)hi(Gi)]. A theorem by Price ([4], Sect. 3.1.1) gives the relationship 


d (k Hu(r) 

9ph(r) w 


1 

Var[X/(0] 


E[hf\Gi{t + T))hf\Gi(t))i, 


i = 1, . . ., d. 


(3.37) 


so that, since h'fif > 0, f,;(r) is an increasing function of pair) e [— 1, 1] taking 
values in [F* , 1], 
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Translation functions do not exists for pairs ({T)}, {4T// }) with £;/ taking values 
outside the range [£*, 1]. The requirement £,, ( r) e [/*, 1], r e I, is necessary but 
not sufficient for the existence of translation functions. The existence of translation 
functions also requires that the images pa of be positive definite. Additional 
considerations on the existence of translation random functions can be found in [4] 
(Sect.3.1) and [16]. ▲ 

Example 3.38 Let X(t) = Git ) 3 , t e R, where G is a stationary Gaussian process 
with mean 0, variance 1, and correlation function pi r) = E\G(t + r)G(t)]. The 
scaled covariance function of X is £( r) = p(r)(3 + 2p(r))/5, a result that can be 
obtained by direct calculations. We have £* = —1 in agreement with a previous 
observation regarding odd transformations. O 

Example 3.39 Set X(f) = exp(G(f)), ( e R, where G(f) in as in Example 3.38 
Then £(t) = (l — exp(p(r)))/(l — exp(l)), so that £* ~ —0.3679. As previously 
indicated, £(r) = 0 and 1 for p(r) = 0 and 1, respectively. A 

Example 3.40 Let X(t), t > 0, be a real-valued translation process defined by X it) = 
F~ l o 0(Git)), where G(t) denotes a real-valued stationary Gaussian process with 
mean 0, variance 1, and correlation function p(r) = £[G(f + r)G(t)], where F is 
an absolutely continuous distribution function with density/. The finite dimensional 
densities of X are 


fix t, ■ • x n ; 1 1 t„) 



(3.38) 


wher e p — {E[Gitp)Gitq)]}, y p — O *o Fix p ), p,q = 1, ...,n, (p is the density 
of NiO,l), and y = iy\, . . y n ). O 

Proof The finite dimensional distribution F of order n of X is the probability of 
the event {G(0) < y\, . Git„) < y„}, and f„ix i, . . ., x n \ t \, . . ., t„) results by 
differentiation. Alternatively, / can be obtained from the density of the Gaussian 
vector (G(fi), . . ., Git,,)) and the change of variable defining X. A 

3.7.3 Ergodic Random Functions 

Useful information on ergodic random functions can be found in [2] (Sect. 6.5), [7] 
(Sect. 2.6 ), and [17] (Chap. 13). This section only provides a brief introduction on 
ergodicity. 

Definition 3.25 An R r/ - valued stationary random function X(t), t e , is called 
ergodic if temporal/spatial averages of a particular sample coincide with E[X (t)], 
that is, temporal/spatial and ensemble averages coincide. 

We give without proof two results for stationary Gaussian random functions pro- 
viding simple criteria for assessing whether a Gaussian function is or is not ergodic. 
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Note that translation random functions defined by continuous mappings of ergodic 
Gaussian functions are ergodic. 

Theorem 3.24 A stationary Gaussian random function is ergodic if and only if its 
spectral distribution function is continuous everywhere ([2], Theorem 6.5.3). 

Theorem 3.25 A stationary Gaussian random function is ergodic if its correlation 
function r satisfies the condition r(r) —>■ 0 as ||r || — > oo ([2], Theorem 6.5.4). 

A heuristic interpretation of the latter theorem is that for an ergodic Gaussian 
function X the random variables X(t) and X(t + r) are approximately independent 
for a sufficiently large || r || . 

Example 3.41 Let Xk = Y + Yk, k = 1,2, . .., be a time series, where 
{Fjt} are iid real-valued random variables independent of random variable Y and 
Y\, Y e Lr . The time series { AT ) is stationary but is not ergodic since its tempo- 
ral mean (l/n)2Jt=i %k = Y + (l/w)2Jt=i Yk converges a.s. to Y + E[Y\\ as 
n — > oo by the strong law of large numbers (Example 2.32), so that it depends on 
the particular sample of Y. O 

It is common in applications to consider weaker definitions for ergodicity, such 
that ergodicity in the mean and mean square. For example, we say that a real-valued 
process X is ergodic in the mean if lim T ^ 0C , f X(t)dt/( 2r) = Z£[X(f)] with 
probability one. 

Example 3.42 A real-valued weakly stationary process X with mean pt and covari- 
ance function c(h) = E[(X(t + h) — pt){X(t) — /x)] is ergodic in the mean if 

lim Var[A T ] = lim - [ (1 — \a\/r)c{a) da — 0. (3.39) 

r->oo r-»oo t J_ r 


Note that X is ergodic in the mean if its covariance function is absolutely integrable 
since | JZ r (l — \a\/x)c(a) da\ < fl T \c{a)\ da < |c(a)| da, and the latter 

integral is bounded by assumption. For example, X with c(h) = exp(— a|/; |). 7. > 0, 
is ergodic in the mean since f_ |c(ot)| da — 2/k. O 

Proof Let X r = (1/r) fl^ 2 X(t) dt and X r = (1/r) X (t) dt , where X (t) = 
X(t)—p. . Note that X T is an unbiased estimator for the mean of X(t) since E [X T ] = pt. 
The variance of X T is 


- 2 i r /2 r ' 2 

Var[X T ] = E[X t ]=— \ 

r ~ J-r/2 J-t/2 

r0 


c(t — i) ds dt 


1 


r r/2 


c(a) da 


' — r/2 — a 


df 


c(a) da 


nT/2 — a 


I- r/2 


dp 


c(a)(r + a)da+ / c{a){r — a)da 


1 


= — / (r — \a\)c{a) da = — I (1 — \a\/x)c(a) da 
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by the change of variable (a — t — s, ft — s). If Var[X T ] — »■ 0 as r — »■ oo, a single 
sample of X suffices to find its mean provided the sample is infinitely long. A 


3.7.4 Markov Random Functions 

Markov chains and processes are defined and illustrated by examples. Markov ran- 
dom fields differ significantly from Markov chains and processes since time has a 
natural flow from past to future while space does not have a preferred evolution. We 
do not discuss Markov fields; useful information on this class of random functions 
can be found in [2] (Appendix), [18-21], and [7] (Sect. 8.4). 


3.7.4.1 Markov Chains 

Let A be a real-valued Markov chain taking values in a finite set S' = { 1, 2, . . ., q] 
called state space. We refer to X as discrete time, discrete state Markov process 

or discrete state Markov chain. Denote by X n , n = 0, I the state X at time 

n, i r(n) = (n\(n) = P(X n = 1 7t q (n) = P(X n = q))' a ^-dimensional 
vector, and p(m, n) = {pij(m, n) — P(X„ = j \ X m = i), i, j — 1, . . ., q}, n > 
m > 0, a (q . f/) -matrix of conditional probabilities ([22], Chaps. 5 and 6, and [23], 
Chap. 2). 

Definition 3.26 If p(m, n) has the properties Pij(m, n) = P(X n — j \ X m = i ) e 
[0, 1] for all i, j = 1, . . ., q and X/=i Pij ( n ■ n + 1) = 1 at all n > 0 and states 
i e S, it is called transition probability matrix. The second condition means that 
X lj+ ] e S with probability 1 conditional on {X n — i], i e S. 

Theorem 3.26 If { X n , n = 0, 1, ...} is a Markov chain with values in S = 
[1,2, . . ., q }, then 


n(n)' = n{m)' p(m, n), n > m > 0, and 
p(n, k) — p(k, m)p(m , n ), n > m > k > 0. (3.40) 

The second relationship is the Chapman-Kolmogorov equation. 

Proof Conditional on { X, n = i }, we have { X n = j } with probability pij(m,n ) 
so that P{X n = j) = Pij( m ’ n )P(X m = i ), that is, the first relationship. 
The equalities P(X n = j , X m = /, X^ = i ) = P(X n = j \ X m = l, X^ = 
i)P(X m = l,X k = i ) = P(X„ = j | X m = l)P(X m = l,X k = i) give 
Pij(k,n ) = X/Li Plj(. m > n )Pil(k>m) by summation over all / e S or p(k,n) = 
p{k, m) p (in , n). A 

Definition 3.27 A Markov chain with transition probability matrix p(m, n), n > 
m > 0, is said to be homogeneous if p(m, n) — p( 0, n — m), that is, its transition 
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matrix depends on the time lag n — m rather than m and n. By abuse of notation, we 
use p(n — m) for p( 0, n — m) and p = p( 1) for p(0, 1). 

Theorem 3.27 Let { X n . n = 0. 1 .... 1 be a homogeneous Markov chain taking 
values in S — {1,2, ...,< q} with transition probability matrix p = p( 1). Then 
(Exercise 3.13) 

p(n ) = p'\ n = 1,2,..., 

9 

Pij > 0, i, j e S, such that ^ /?, ; = 1, j e S, and 

7=1 

pin + m) — p(n)p{m ), «, m > 0. (3.41) 

Example 3.43 Let Xq = 0 and X„ = Xfci n L 1, where F& are iid random 
variables with probabilities q r = P{Y\ = /-), where r > 0 is an integer. Since 
P(Xn + 1 = 7 I Xo.X!,...,^) = P(Y n+l = j - X n ) = qj-X(n), {*n,n = 
0, 1, . . .} is a Markov chain with transition probabilities pij = P( X ll+ \ = j \ X n = 

i) = qj-i ■ <> 

Definition 3.28 Let{X„, n = 0, 1, . . .} be a Markov chain with state space S. Denote 
by Tj the time of the first visit to state i and by N, the total number of visits to i. 
The state i is recurrent if P ( T, < oo) = 1; otherwise, if P ( 7) = +oo) > 0, state i 
is said to be transient. A recurrent state i is null if E [ 7) ] = oo; otherwise, is called 
non-null. A recurrent state is periodic with period r if r > 2 is the largest integer for 
which P(T{ = nr for some n > 1) = 1; otherwise, if there is no such r > 2, i is 
called aperiodic. 

Definition 3.29 A collection of states C c S is closed if no state / ^ C can be 
reached from a state in C, that is, P(X n +\ £ C \ X n = i) — 0 for all i e C. If 
C = {/ ) consists of a single state, then i is called an absorbing state. A closed C is 
irreducible if no proper subset of it is closed. A Markov chain is called irreducible if 
its only closed subset is the set of all states. 

Example 3.44 Let {X n , n =0, 1 , . . .} be homogeneous Markov chain with transition 
probability matrix 


0.50 0.25 0.25 0.00 
0.00 0.50 0.50 0.00 
0.50 0.00 0.50 0.00 
0.00 0.50 0.50 0.00 


State 4 is transient since it cannot be reached. The collections of states {1, 2, 3, 4} 
and {1, 2, 3} are closed since, for example, X n e {1,2, 3} will never leave this set 
of states. The states 1, 2, and 3 are not periodic since they can be accessed at every 
time step. O 
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When dealing with homogeneous Markov chains we may be interested in their 
long-term behavior, that is, the probability law of X n as n — > oo. If the probability 
law 7z(n) of X n becomes invariant as n —>■ oo, then jr st = lim„_ >00 n(n) exists 
and is called stationary distribution. Conditions for the existence of the stationary 
distribution can be found in [23] (Sect. 2.12). The following theorem provides a 
method for calculating stationary distributions. 

Theorem 3.28 Let { X n , n = 0,1 ,...} be a homogeneous Markov chain with 
state space S = [1,2 ,...,q] and transition probability matrix p = p( 1) that 
is irreducible and aperiodic. Then the stationary probability i r st is the solution of 
7 Tj t = (1, ..., Y)(I — p — Iq)~\ where / is the identity matrix and l q is a (q , q) -matrix 
with all entries equal to 1 ([23], Proposition 2.14.1). 

Example 3.45 Let \ X n . n = 0, 1, . . .} be a homogeneous Markov chain with transi- 
tion probability matrix 


0 1/3 2/3 
1/3 0 2/3 

1 0 0 


The stationary distribution of the chain has been calculated from 7r' t = (1, .... 1) 
(/ — p — I q )~ l and is 7r st (l) = 0.45, 7r st (2) = 0.15, and 7r st (3) = 0.40. <> 


3.7.4.2 Markov Processes 

Definition 3.30 Let X(t), t > 0, be a real-valued stochastic process. If for every 
integer n > 1 and times 0 < t\ < ■ ■ ■ < t n the conditional random variables 
X(t n ) | X(t n - 1 ) and (X(t n - 2 ), . . ., X(t\)) \ X(t n -\) are independent, then X is a 
Markov process, that is, past (X(t n - 2 ), . . ., X (t\ )) is independent of future X it n ) 
conditional on present X(t n - 1 ). The definition extends directly to vector-valued 
processes. 

The condition E[g(X{t+x)) \ & t \ = £[g(X(r + r)) | X(t)], required to be satisfied 
for arbitrary t, r > 0 and Borel functions g : R — > M, provides an alternative 
definition for Markov processes ([24], p. 156), where — <j(X(s), 0 < s < t). 

Definition 3.31 The transition density of X is the density of the conditional random 
variable X{t) \ X(s ), 0 < s < t. 

Theorem 3.29 The finite dimensional densities of a Markov process X can be 
obtained from its transition density and its marginal density at the initial time. More- 
over, the random variables X(t n ) \ (X(t n - 1 ), . . ., 2f(fi)) and X(t n ) \ X(t n - 1 ), t\ < 
■ ■ ■ < t„, have the same distribution. 

Proof Let f (x\, . . ., xf) denote the density of (X{t\), . . ., Xff)), k > 2, 
t\ < ■■■ < t/c, and f(x n | x„-i ) the density of X(t„) | X(t n -\). We have 
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f(x n ,x n - 2 ..., xi | x n - 1 ) = f(x n - 2 , . . | x„_i)/(x„ | x„_i) since, condi- 
tional on present, past and future are independent. This implies f(x n \ x n -\) = 

f(x n ...,xi)/f(x n -i,...,xi) = f(x n | x„_i xi)so that/(x„, . . „x\) = 

f(x n I x„-\,...,x\)f{x„-i,...,x\) = f(x n | x„_i)/(x„_i,...,xi). Repeated 
use of the latter formula gives f n {x i, . . ., x„) — f(x i) n"=2 f(Xj | x;_i), where 
/(xi) denotes the density of X (t\). The latter formula gives f n (x \,...,x n )/ 
f n - i(xi, . . ., x„_i) = /(x„ | x„_i), that is, the second part of the theorem. ▲ 

Theorem 3.30 Let fx(v)\x(u )( • | ■), v > u, denote the density of X(v) \ X(u). For 
to < s < t the conditional densities of X (t) \ X(to), X(t ) | X(s), and X(s) \ X(to ) 
are related by 

fx(t)\x(t 0 )(x I xo) = / fx(t)\X(s)( x I y) fx(s)\x(to)iy I xo)dy, (3.42) 

7R 

referred to as the Chapman-Kolmogorov equation (Exercise 3.14). 

Example3.46 The time series in Example 3.2 is Markov since Xi \ (Xj- i,X/_ 2 , ...) 
d o 

= Xi | X,-_] ~ N(pXj_i , 1 — p ). This Markov process has discrete time and 
continuous state since the index time set I is countable and the state can take any 
value on the real line. O 


3.7.5 Processes with Independent Increments 

This class of stochastic processes is essential for our discussion on stochastic differ- 
ential equations. Considerations in this section are limited to real-valued stochastic 
processes. Extension to vector-valued processes is direct. 

Definition 3.32 A real-valued process X(t), t > 0, has independent increments if 
the random variables X (t) — X (v) and X(u) — X (s ) are independent for all .v < 
u < v < t. A process is said to have stationary independent increments if it has 
independent increments and the distribution of X(t) — X(s), t > s, depends only 
on time lag t — s, rather than .v and t. 

Brownian motion is an example of a process with stationary independent incre- 
ments (Definition 3.3). The Poisson and the compound Poisson processes introduced 
in Example 3.5 also have stationary independent increments. 

Theorem 3.31 The finite dimensional densities of a real-valued process X(t), t > 0, 
with X (0) = 0 and independent increments are 

fn(x 1 , . ..,x„; ti , . . ., f„) = /r 1 (xi)/y 2 (x 2 - xi). . ./r n (x„ - x„_i), (3.43) 

where n > 1 is an integer, 0 = to < h < ^2 < ■ ■ ■ < t n denote arbitrary times, 
f n (x i, . . ., x„; t\, . . ., t n ) is the density of (X(t\), . . ., X(t n f), and fy t is the density 
ofYj — X(ti) — X(t,-i), i = 1 
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Proof Since P(X(t x ) < x\,...,X{t n ) < x n ) = P(Y l < xi,Y x + Y 2 < 

*2 , ■ ■ ; ZLl Y k ^ •*«)> we have 


P(X(t\) < Xi, X(t„) < x n ) = / dyif Yl (yi) dy2f Yl (y2) 

. 7—00 J —oo 

fx n - 1 -'ZkZl yk r x n -Sfe} y* 

• / dy n — \ fy n _i ( Vn— l ) / dy n f Yn (y, 


The finite dimensional densities of X result by differentiation of the above equation or 
from the density of the random vector (Y\ , . . Y n ) and the mapping (Y\ , . . Y„) 
(X(h ),..., X(t n )). ▲ 

Theorem 3.32 IfX(t), t > 0. is a real-valued process with stationary independent 
increments and finite variance, then (1) X is a Markov process, (2) p(t) = E[X (t)] 
and a{t) 2 = E[(X{t) — p(t)) 2 \ are linear function of t provided X (0) = 0, that is, 
fx(s + t) — p{s) + pt(t) anda(s + t) 2 — o(s) 2 + o(t) 2 , and ( 3 ) f n+ T2 = f T2 * f n 
and ip Tl +r-i = tPutPn, where r\,X 2 > 0, f T and tp T denote the density and the 
characteristic functions ofX(t + r) — X(t), r > 0, and * denotes convolution. 

Proof That X is Markov follows from (3.43) showing that the conditional random 
variables X{t n ) \ (X(t n - 1 ), . . X(fi)) and X(t n ) \ X(t n - 1 ) have the same density. 
Since X (t + t\ + r 2 ) - X(t) = (X(t + ri + r 2 ) - X(t + ri)) + (X(t + x j) - X(t) 
and (X(t + rj + r 2 ) — X(t + ri)) is independent of (X(t + ri) — X(t), we have 
fz l +r 2 = fz 2 * fxi ■ Similar arguments imply (p Tl +t2 — (p Ti <Pz 2 and the stated properties 
of the mean and variance of X. A 

Example 3.47 The finite dimensional density of order n of a Brownian motion B is 


fn(x i, . . .,x n \ t\, . . ., t n ) = <t>\ —j= 


( Xl \d)( Xl ~ Xl ^ / x„ - x„-i \ 

\ \J ^2 t\) V ft n tn—lj 


(3.44) 


where 0 < t\ < ■ ■ ■ < t n . Let C (t) = Y k be a compound Poisson process, 

where X > 0 is the intensity of the underlying Poisson process N, {Yf\ are iid real- 
valued random variables. The characteristic function of (C{t\), . . ., C(t n ) ) can be 
obtained simply from the characteristic function of increments C(t) — C(s), t > s, 
of this process, that is, the function £’[exp(;M(C(t) — Cf?))] = exp (k(r — s)(l — 
(Py(u) j), where cp y (u) = E[exp(iuYi)]. O 

Proof The density in (3.44) follows from (3.43) and the independence of increments 
of B. The characteristic function of C(t) — C(s ) — C{t — s), t > s, is 


OO 

E[e iuC(t - s >] = ^ E[e iuC(t ~ s) \ N(t - s) = n]P(N(t - s) = n) 

n= 0 

= e - x(t ~ s) ^ (cp y (u)) n [Ht ~ S)] " = 

n = 0 ' 
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since E[exp(iuC(t — s)) \ N(t — s) = n ] = E[exp(iu = <Py(u) n 

and P(N(t — s) = n ) = [k(f — sy\ n e~^~ s ^ /n\. The characteristic function of 
(C(t i), . . C(t n )) can be obtained from that of C(t') — C(s) since, for example, the 
characteristic function of (C(t i), C(f 2 ), C ( ft ) ) has the expression 

£'{exp[i(niC(ti) + w 2 C(f 2 ) + M 3 C(f 3 »]} 

= £{exp[i((t(i + u 2 + M 3 )C(ti) + (m 2 + M 3 )(C(f 2 ) - C (?i)) 

+ M 3 (C(f 3 ) — C(f 2 )))]}, 

that depends on the characteristic functions of increments of C. ▲ 


3.7.6 Continuous Time Martingales 

This class of processes is relevant for a broad range of topics considered in the book, 
including the characterization of the state of physical systems subjected to random 
noise. Following general considerations on continuous time martingales, we discuss 
Brownian motion, Poisson, and Levy processes, and define white noise processes. 

The definition and many of the properties of continuous time martingales are 
similar to those for discrete time martingales. We only provide a brief introduction 
on continuous time martingales. The reader may consult [3] (Chaps. 2 and 5), [24] 
(Chap. 2), [6] (Chap. 1), and [25] (Chap. 4) for additional information on this topic. 

Definition 3.33 A real-valued stochastic process X defined on a filtered probability 
space (J2, & , (J^]) ? > o, P) is an -martingale if (1) £'[|A(t)|] < oo for all t > 0, 
(2) X is ^-adapted, that is, X(t) e J for a ll t > 0, and (3) E[X(t) \ ,^ s ] = X(s) 
for all s < t. Condition (3) can be replaced by E[X(t) \ & s ] = X(s A t) for all 
t , s > 0. Submartingales, supermartingales, and p-integrable martingales are defined 
as for discrete-time processes (Sect. 2.12). 

Example 3.48 Let The a random variable with finite mean defined on a probability 
space (T? , & , P) with a filtration (J^j)r> o- The process X{t) — E[Y \ j£)], t > 0, 
is an -martingale. O 

Proof Jensen’s inequality for random variables gives £'[|A(f)|] = E{\E[Y \ 
^t]\]} < E{E\\Y\ | JP t ]} = £[|7|], so that X(t) is integrable. X is ^-adapted 
since E[Y \ J^] is -measurable. Properties of the conditional expectation give 
E[X(t) | & s \ = E{E(Y | | & s ) = E[Y | J^] = X(j) for all s <t. A 

Example 3.49 Let C(t) = ^ > f — 0, be a compound Poisson process, where 

{ Yk } are iid random variables with mean 0 and N(t) is a homogeneous Poisson process 
with intensity X > 0. Then Cis a martingale with respect to its natural filtration & t . 

O 

Proof Since E[C(t)] = E{E[C(t) | N(t)]}, E[C(t) \ N(t)] = 0, and C(f) e 
the first two defining properties for martingales are satisfied. The third property holds 
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since, for s < t, E[C{t) \ X s \ = E\(C{t) — C(s )) + C(s) \ X s ] = E[C(t) — 
C'(.v)] + C(s) since C(1) — C (.v ) is independent of X s and C(s) is -measurable. 
This gives E[C(t) | X s \ = C(s) since E[C(t) — C(^)] = 0 by assumption. ▲ 

Definition 3.34 A real-valued process X defined on a filtered probability space 
(12, X , (JF t )f>0. P) is an &t - local martingale, submartingale, or supermartingale 
if there exists an increasing sequence T n , n = 1,2, ..., of ^-stopping times, 
that is, T n < T n+ i, such that (1) lim,,-^ T n = +oo a.s. and (2) the stopped process 
X Tn (t) = X (t A T n ) is an -martingale, submartingale, or supermartingale, respec- 
tively, for each n. A sequence T n with these properties is called a localizing sequence. 
Recall that T is a stopping time if {T < f} e f° r a ll t > 0. 

Martingales, submartingales, and supermartingales are local martingales, sub- 
martingales, and supermartingales, respectively, since T n = oo, n > 1, is a local- 
izing sequence for them. We will use local martingales to define stochastic integrals 
(Sect. 4.4.3). 

Theorem 3.33 Increments of a martingale X over non-overlapping intervals are 
orthogonal provided X (t) e I? for all t > 0. 

Proof The expectation E[(X(t) — Z(i , ))(X(v) — X(n))], u < v < s < t, is equal 
to E{E[(X(t) - X(s))(X(v) - X(«» | ^]} = E{(X(v) - X(u))E[X(t) - X(s) \ 
which is 0 since X is a martingale so that E[X(t) — X(s) \ JF S ] = 0. ▲ 

Definition 3.35 Let IX and 6 be the smallest a -fields on [0, oo) x Q with respect 
to which all left and right continuous -adapted processes are measurable, respec- 
tively. A process X is called predictable if the mapping (f, u>) i-> X (t , at) is 
^-measurable. If this mapping is ^-measurable, then X is said to be optional. 

Predictable processes allow a peek into the future since they are left continuous. 
Note also that predictable and optional processes are measurable since the a -fields 
IX and & are included in SS[ 0, oo) x IX and that IX c 0 ([5], Sect. 3.1 1.1). The 
following theorem stated without proof gives an important property of martingales. 

Theorem 3.34 Every martingale has a unique modification whose samples are right 
continuous with left limits a.s., that is, every martingale has an optional modification 
([6], Corollary 1, p. 8). 

Theorem 3.35 (Jensen’s inequality) IfX is an X t -martingale and <p : R (->■ R is 
a convex function such that Zs[|^(X(r))|] < oo, t > 0, then E[(p(X(t)) \ X s \ > 
(p(X(s)), s < t. 

Proof Convex functions are continuous so that (p(X (t)) is a random variable. Since 
(p is a convex function, we have <p(x) = sup{/(x) : l(u) < <p(u), Vm}, where l{u) = 
au + b and a, b are real constants. Then 

E[cp{X{t)) | ] = £[sup{/(X(f))} | X s i > sup{£[/(X(0) I X s ]} 

= sup {l(E[X(t) | X s ])} = sup {/(X(s))} = <p(X(s)) 
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Table 3.1 Defining properties for Brownian motion, Poisson, and Levy processes 


Brownian motion 

Poisson 

Levy 

-adapted starting at zero 

Stationary increments that are independent of the past 
Gaussian increments Counting process 

Continuous in 

1 

o 

<! 

i 

5 

without explosions 

probability 


by Theorem 2.1 1, linearity of the expectation operator, and the martingale property 
of X(t). A 

The remainder of this section deals with Brownian motion, Poisson, and Levy 
processes, use these processes to illustrate properties of continuous time mar- 
tingale, and develop preliminary tools for solving stochastic equations. Table 3.1 
adapted from [5] (Sect. 3.14) summarizes the defining properties of Brownian motion, 
Poisson, and Levy processes. 


3.7.6.1 Brownian Motion 

The construction of a Brownian motion process Bit), 0 < t < 1, may involve the 
following three steps ([23], Sect. 6.3). First, select a sequence of refining partitions 
p n = {k/2 n : k = 0, 1, ...,2"} in [0,1], where n > 1 is an integer. Second, 
define a sequence of processes B (n> in [0,1] such that (i) the random variables 
B (n \k/ 2") - B ( ' l H(k - l)/2") are independent copies of (V(0, 1/2”) and (ii) the 
samples of B {n) are linear in each interval [(k — l)/2”, k/ 2"]. Third, define Bit) as 
the limit of the sequence of processes B (n) it) as n — > oo. The limit process Bit) has 
continuous samples and stationary independent Gaussian increments. 

Definition 3.36 Let (£2, JP, (,iR) f >o, P) be a filtered probability space. A process 
B(t), t > 0, defined on this space is a Brownian motion if (1) it is JF? -adapted and 
starts at zero, (2) it has stationary increments that are independent of the past, that 
is, B(t) — Bis), t > s, is independent of s , and (3) B(t) — Bis) ~ N( 0, t — s) 
for t > s (Table3.1). 

A similar definition holds for R^- valued Brownian motions. In this case, the 
increments Bit) — Bis), t > s, are Gaussian vectors with mean zero and covari- 
ance matrix y(r — s), where y is an (Rc/j-positive definite matrix. We use for B a 
modification of Brownian motion with a.s. continuous samples that is guaranteed to 
exist ([6], Theorem 26). Note that independence of the past is a stronger require- 
ment than that of independent increments. Let X be a process adapted to a filtration 
J ? t , t > 0, and take 0 < u < v < s < t. Since X it) — X is ) is independent of & s 
by hypothesis, it is also independent of X (v) — X iu) e C . 
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Theorem 3.36 Brownian motion is measurable, that is, the function B : I x 32 — >■ M 
is measurable from (I x 32, 38(1) x fF) to (R, 38), where I is a bounded interval. 

Proof Set I = [0, 1] and let B (n) (t,co) = B(k2~ n ,co), t e ((k - 1)2“", k2~ n ], 
k = with B ( ’3(0,co) — 0. The process B ( has piecewise constant 

sample paths, depends on a countable number of values of B, and can be viewed 
as a random walk with time step At = 2~ n defined by a sum of independent 
Gaussian variables with mean zero and variance 2~" for each n, so that the function 
( t , co) m- B^(t, to) is measurable from 38(1) x & to (R, 38). The measurable 
mapping (t, to) i-a B (n) (t, co) converges to (t, to) i-a B(t, to) pointwise by the con- 
tinuity of the sample paths of B so that the function B(t,co) = lim„_ >00 B ft, to) 
is measurable, since limits of measurable functions are measurable. ▲ 

Example 3.50 Let p n = (0 = tq < t] < ■ ■ ■ < t n = t) be a sequence of partitions 
of [0,f] such that its mesh A(p n ) = maxi <k< n (tk — t k -\) —*■ 0 as n — > oo. Then 
E[n =1 (AB k ) 2 ] = t for each n and XL=i (AB k ) 2 ~ > t in m.s. as n — > oo, where 
ABk = B(tk) — B(tk- 1). <> 

Proof Since AB^ ~ N( 0, ft — i ) are independent random variables, we have 

E[T'Li( AB k) 2 ] = ZLt E[(AB k ) 2 ] = ZLt (<k - tk- 1) = t. For At k = t k - 

t k - 1 we have 

E (Y^(AB k ) 2 -t \ 1 = jr E[(AB k ) 2 (ABi) 2 ]-2tY j E[(AB k ) 2 ]+t 2 
-'k = 1 ' ^ k,l= 1 k= 1 

n n n 

= 3^(At k ) 2 + ^ At k At, ~t 2 = 2^(At k ) 2 , 

k= 1 k,l=l,kjtl k= 1 

and Yl k =\( At k) 2 < (maxt Z"=i At k = (ma\kAtk)t, so that £[(Z!t=i 
(AB k ) 2 — f)“] — > 0 as n -> oo. ▲ 

Theorem 3.37 If p n is a sequence of refining partitions of\i),t\ such that Ap n — > 0 
as n — ► oo, t/zen limn^oo Zl' =1 (AB k ) 2 = t a.s., where AB k = B(t k ) — B(t k - 1 ) 
([6], Theorem 28, p. 18). 

The process [B]G) = Zt=t ( B(t k As) — B(t k -\ A s)) 2 , s e [0, t], defined by 
square of increments of Brownian motion is referred to as the quadratic variation of B, 
and can be defined for more general processes as we will see in a subsequent chapter. 
Theorem 3.37 justifies the notation (dB(t)) 2 — dt used frequently in applications. 
The left panel of Fig. 3.6 shows three samples of [fi](^) for s e [0, f], t = 1, and 
t k = k/n, k = 0, 1, . . ., n, corresponding to n=10, 100, and 1000, that is, refining 
partitions. The right panel in the figure shows three samples of [5](s) for n = 1000. If 
n is small, the samples of [/f|(.v) differ from each other and from the identity function 
1 i-A t. These samples nearly coincide with t t for n = 1000. 

Theorem 3.38 The samples of a Brownian motion process B are of unbounded 
variation a.s. in any bounded interval. 
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Fig. 3.6 Samples of 
[S]« = ZLl (B{t k As) 

S(4-!Ai)) 2 , i G [0. 1], 
for tk = k/n 




S 


Proof Let p n = (0 = to < t\ <■■•<?„ = t) be a sequence of refining partitions 
of [0,r] such that A(p„) — > 0 as n — > oo. The left side of the inequality 

n n 

T[B(t k ) - B(tk~i)] 2 < max \B(t k ) ~ B(t k - 1 )| T \B(t k ) - B(t k -i)\ 

x — * k — 

k = 1 k= 1 

converges a.s. to t as n — > oo by Theorem 3.37 and max*; B(t k ) — B{t k - \ ) | converges 
a.s. to 0 as n —> oo by the continuity of the samples of B. To satisfy the above 
inequality, Xl!t=i I B(t k ) — Z?(^._i)| must approach infinity a.s. as n -» oo (see also 
[6], Theorem 29). ▲ 


3.7.6.2 Poisson and Compound Poisson Processes 

Let {T„,n =0, 1 , 2, . . . , oo} with Tq = 0 be a strictly increasing sequence of positive 
random variables, and define the counting process 

OO 

N{t) = ^ 1 (t> T n ), t > 0. (3.45) 

n= 1 

We say that N(t) is without explosion if sup H T n — oo, so that, it is finite a.s. in any 
bounded time interval. 

Definition 3.37 A process N(t), t > 0, defined on a filtered probability space 
(L?, (J?)) f >o, P) is said to be a Poisson process if it (1) is ^(-adapted and starts 

at zero, (2) has stationary increments that are independent of the past, that is, N(t) — 
N(s), t > s, has the same distribution as N(t — .v) and is independent of J^, and 
(3) is counting without explosion (Table 3.1). 

The defining properties of Poisson processes imply that the jump times {7},} must 
be ^-stopping times, since N is ^-adapted ([6], Theorem 22, p. 14), and that N(t) 
is a Poisson random variable with distribution 

P(N(t) = n) = —e~ Xt , n = 0,1,..., (3.46) 

n\ 

where X > 0 denotes an intensity parameter ([6], Theorem 23, p. 14). 
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The mean and covariance functions of a Poisson process N with intensity X > 0 
are F[iV(f)] = Xt and Cov[N(s)N(t )] = X(s A t). The characteristic function of 
N(t ) is (p(u\ t) = exp [ — Xt(\ — e'“)] (Exercise 3.16). 

Example 3.51 Let N(t) be a Poisson process with intensity X > 0. Then N(t) is a 
submartingale and its compensated version, N{t) — Xt, is a martingale. The process 
(IV (t) — Xt) 2 — Xt is also a martingale. O 

Proof N(t ) and /V (t) — Xt have finite mean and are ^-adapted. For t > s, E[N(t) \ 
& s ] = E\(N(t) - N(s)) + N(s) | &,] = E[N{t) - N(s)] + N{s ) = X(t - s) + 
N(s) > N(s) since N(t ) — N(s) is independent of and N(s) is ^-measurable. 
Hence, IV is a submartingale and the compensate Poisson process is a martingale. 

For (IV (f) — Xt) 2 — Xt , note that E[(N(t) — Xt) 2 ] — Xt < oo and (N (t) — Xt) 2 — Xt 
is ^-adapted. For 1 > s, 

E[(N(t) - Xt) 2 | J^] = E[((N(t) - Xt) - (N(s) - Xs) + ( N(s ) - Xs)) 2 \ & s ] 
= E[{N(t) - N(s) - X(t - s)) 2 | & s ] + E[(N(s) - Xs) 2 \ & s ] 

+ 2E[(N(t) - N(s) - X(t - ■s'DOVOs') - 'Xs) \ & s \ = X(t - s) + ( N(s ) - Xs) 2 

since N(t) — N(s) is independent of & s , N(s) e fP s , and N(t) — N(s) — X(t — s) 
is a martingale. Hence, (IV (f) — Xt) 2 — Xt is a martingale. ▲ 

Example 3.52 Set M(t) = N(t) — t, t > 0, where N is a Poisson process with 
unit intensity. The quadratic variation process of M{t) = N(t) — t, t > 0, 
is [M](t) = N(t), where [M] is defined in the same manner as [B] shown in 
Fig. 3.6. O 

Proof Let p n — {kt In, k = 0, 1, . . ., n] be a partition of [0, t ] and set AM \ = 
M(kt/n) — M((k — l)t/n). If n is sufficiently large, the intervals (( k — l)t/n, kt /n ] 
contain at most a jump of N so that AM^ = — i/n if there is no jump and AM , t = 
1 — \/n if there is ajump. Let J„ (t) be the collection of intervals ((k — \)t/n, kt /n ] 
containing ajump of N, so that for sufficiently large n we have 

S n (M, M) = Y, (AM k ) 2 + Y ( AM k ) 2 = Y !/» 2 + Y (i- 1 /") 2 
k<tJn( r) keJ„(t ) k$J n {t) keJ n (t) 

= (n — N{t))/n 2 + N(t)( 1 — l/«) 2 — > N(t), as n — > oo, 

that is, the sum of the square of increments of M in [0,f] over partitions of this time 
interval converges a.s. to N(t) as partition mesh decreases to 0. ▲ 

Example 3.53 Let B be a Brownian motion and N denote a Poisson process with 
intensity X > 0. If B and N are independent of each other, the quadratic variation 
process of B + N is [B + N](t) = t + N(t), where N(t) — N(t) — Xt. O 

Proof Let p n = {kt/n, k = 0, 1, . . ., n } he a sequence of partitions of a time interval 
[0, t] and consider the sums 
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Sn(B + N,B + N) = Y, {AB k + AN k ) 2 = 

k k k 

+ 2^(zl B k )(AN k ), 

k 

where AB \ = B(tk) — B(tk-\), AN k = N(t k ) — N(t k - i), and t k = kt/n. If the 
sequence of partitions p n is refining and A(p n ) — »■ 0 as n — > oo, then Z k {AB k ) 2 
and ^/.(ANk) 2 converge to t and N{t ) a.s., while Z& ( AB k )(ANk ) converges to zero 
since the Brownian motion has continuous samples so that | '^ jk (ABi c )(ANk)\ < 
Y,k\ABk\\AN k \ < max k \ABk\'^ k \ANk\ and max* \AB k \ -* 0 as n ->• oo. 
Hence, ,S'„ (fi + IV, B + N) converges a.s. to t + N(t) as n -» oo. If the sequence of 
partitions p n is not refining, the convergence to t + N(t ) as n — > oc is in m.s. and 
probability. ▲ 

Compound Poisson processes have been introduced in Example 3.5 as sums of 
iid random variables arriving at the jump times of Poisson processes. This definition 
can also be given in the form 


oo N(t) 

c(t ) = > T n ) = Y, Yn, t> 0, (3.47) 

«= 1 n=l 

where IVisa Poisson process with intensity X > 0, the random variables {7),} denote 
the jump times of N, and { Y n } are iid random variables. 

If the random variables {Y,,} in (3.47) have finite variance, then E[C(t)] = 
XtE[Y\\ and Cov[C(.j), C(?)] = X(s A t)E[Y 2 ] are the mean and the covariance 
functions of C(t) (Example 3.16). The characteristic function of C( t) is (p(u\ t) = 
£’[exp(tMC(f))] = exp [ — At(l — (p Yl (m))] since 


E[ exp(tMC(f))] = E 




E{<p Yl (u) N V} 


(Xt) n 

= X! _ ^j - e ~ Xt( PY x {u) n = exp[ — Xt(\ — <p Yl (m))], 


where < p Yl denotes the characteristic function of Y\ . 

Example 3.54 Let C be a compound Poisson process defined by (3.47). The quadratic 
variation of the compensated compound Poisson process C(t ) = C(t) — XtE[ Y\ ] is 

[CKO = Z t=i Y l O 

Proof Let p n = {k/n, k = 0, 1, ...,«} be a sequence of refining partitions for 
a time interval [0,r], and consider a sufficiently large n such that C can have at 
most a jump in the intervals {{k — Y)t/n,kt/n\ of this partition. The increments 
ACk — C (kt / n) — C {(k — 1 )t/n) of C are equal to —XE[Y\]/n + Yk and — XE[Y\]/n 


102 


3 Random Functions 


if there is and there is no jump of N(t) in ((k — 1 )t/n, kt/n\, respectively. Let J n (t) 
be the collection of intervals (( k — 1 )t / n, kt /n] in which N has a jump. We have 

X(AC k ) 2 = X (^) 2 + Z ( A ^) 2 

k= 1 WJ n (t) keJ n (t ) 

N(f) 

= (n - N (t))(X t £[Li]/«) 2 + £(-* t E[Y\]/n + Y k ) 2 
k= 1 


which converges to ^ as n oo. Note also that the quadratic variation of 

m is [cko = T k i i **■ ^ 

We now give an alternative definition of compound Poisson processes that uses a 
random measure specifying the number of jumps of C in rectangles with sides time 
and space. The definition provides a link between compound Poisson processes and 
Levy processes, that are discussed in the following section. 

Definition 3.38 Let (f , dy) be a random measure giving the number of jumps of 
C in the rectangle (0, t] x (y, y + dy] with expectation £[„# ( t , dy)] — Xt dF(y) = 
fi(dy)t, where ii(dy) = X dF(y) and F denotes the distribution of Y \ . Then 

C(t) = [ yjf{t,dy), (3.48) 

JR 

is a compound Poisson process ([26], Theorem 3.3.2). 

Theorem 3.39 Let C be a compound Poisson process and A be a Bore l set in M. 
Then 


N(t) 

C A (t) = ^ Y k 1 (Y k e A) = ^ AC(s)l(AC(s) e A) = yJY{t, dy) 

k= 1 0<s<t 

(3.49) 

is a compound Poisson process corresponding to a Poisson process with intensity 
X = XP(Y\ e A), where AC(s) — C(s) — C(s— ) and C(s—) = lim^ C(.y). 

Proof Since C A retains the jumps of C in A , it is said to be a thinned version of 
C. Let Y k = Y k \(Y k e A) be the jumps of C taking values in A. The characteristic 
function of Y\ is 

<p f (u) = E[e iuYll(Y ' eA) ] = [ e luyHy€A) dF Yl (y) + [ e luy]{y€A) dF Y] (y) 

1 J A Jr^A 

= [ e iuy dF Yl (y) + [ dF Yl OO = 4? («) + P(Yi? A) 

J A JWL\A 

so that kf(l — <pf (u )) = Xt(P(Y\ e A) — ipy A} (u)j = Ar(l — cp{u )), where X = 
XP(Y\ e A), (p ( y A) (u) = J A e luy dF Yl (y), and ip(u) = (p^ ] (u)/ P(Y\ e A) is the 
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characteristic function of Tj | (Fi e A). Since q> C A( t) (u) — exp [At (l — ^S(h))], C a 
is a compound Poisson process with the stated properties. A 


3.7.6.3 Levy Processes 

We define Levy processes, review some of their properties, and present exact and 
approximate representations for these processes. An extensive discussion on Levy 
processes can be found in [1, 3], 

Definition 3.39 Let (Q , & , (■^ r t )t>(i, P) be a filtered probability space. A real- 
valued process X(t), t > 0, defined on this space is a Levy process if it (1) is 
-adapted and starts at zero, (2) has stationary increments that are independent of 
the past, that is, X(t) — X is), t > s, has the same distribution as X(t — s ) and is 
independent of & s , and (3) is continuous in probability (Table3.1). 

Example 3.55 It can be shown that the characteristic function of a Levy process X 
has the functional form 

<p(u; t) = £[e"' x(f) ] = e ~ mu) , t > 0, u e R, (3.50) 

where f is a continuous function with i//(0) = 0 ([6], Sect. 4). The Levy process 
corresponding to tp(u ; t ) with f{u) = \u\ a , a e (0, 2], may or may not be an 
-martingale depending on the value of a. O 

Proof Let/denote the density of X(t) — X(s), t > .v. For every s > 0 the probability 
P(\X(t) — X(5)| > e) = e y f{x) dx converges to zero as (t — j) — > 0 since 
fix) = 1/(2? r) e iux e - (t ~ s)M “du 5(0) as (f - s) 0. 

It can be shown that E[\Xit)\ p ] is finite if and only if p e (0, a) ([1], Proposition 
1.2.16). Hence, X is not a martingale for a < I because the expectation E[\X (f)|] 
is not bounded but this process is an -martingale for a > 1. If a = 2, the 
characteristic function of X(t) is q>iu ; t) = e~ ,u so that X ( t ) ~ N(0, 2 1) is a square 
integrable martingale that has the same distribution as ~j2B{t), where B denotes a 
Brownian motion process. ▲ 

The following four theorems gives essential properties of Levy processes. Note 
that these properties are similar to those of compound Poisson processes. 

Theorem 3.40 Sums of independent Levy processes are Levy processes. 

Proof Let X \ and AT be independent Levy processes defined on a filtered probability 
space (f2, JF, i&f)t> 0. P)- Then X = X\ + Xi is ^-adapted and has stationary 
increments that are independent of the past as the sum of two processes with these 
properties. It remains to show that X is continuous in probability. Set A, = { | X, (?) — 
X i (,v)| > s/ 2], i = 1,2, and A = (lA/f) — X(s)| > e} for some e > 0. Since 
A c 3 Aj fl A|, we have A c Aj U A 2 so that P{A) < P(A\) + P(A 2 ) implying 
P(\X(t) — A - ^)! > s) 0 as \t — s| — > 0 by the continuity of X\ and A 2 . A 
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Theorem 3.41 Levy processes have unique modifications that are Levy with right 
continuous samples that have left limits ([6], Theorem 30, p. 21), preserve their 
properties if restarted at stopping times ([6], Theorem 32, p. 23), and have only 
jump discontinuities ([6], p. 6). 

Theorem3.42 Let X be a Levy process and A denote a Borel set such that 0 A. The 
associated jump process J A (1 ) = Xo<s<r 2\Z(^)1(Z\X(^) e A) = f A y^(t, dy) 
and X — J A are Levy processes, where is a random measure defining the Poisson 
process J A {t) ([6], Theorem 37, p. 27). 

This theorem implies that the process 


Z a (t) = X(t)- ^ 4X(s)l(|Z\X0)| > a) = X(t)~ 

0 <s<t 



y^f(t,dy) (3.51) 


consisting of X from which jumps of magnitude larger than a > 0 have been removed 
is also Levy. Moreover, the moments of any order of Z a are bounded ([6], Theorem 
34, p. 25). 

Theorem 3.43 The jump processes J A ‘ in Theorem 3.42 corresponding to Borel 
sets A i and 7L such thatO f. Aj, i = 1,2, and A[ fl A 2 = 0 are independent Levy 
processes ([6], Theorem 39, p. 30). 

Following are exact and approximate representations for Levy processes. The 
exact representation is given by the Levy decomposition theorem and the Levy- 
Khintchine formula, and is primarily useful for theoretical developments ([3], 
Chap. 2, [6] Sect. 1.4, and [1]). The approximate representation is based on features of 
the large and small jumps of Levy processes, and is most useful in applications. Both 
representations involve the Levy measure Xi(A) = L[N A (\)], that is, the intensity 
of the Poisson process N A {t) = Xo<j</ l(4\X(j) e A), where A e SB is a Borel 
set such that 0 f A, A denotes the closure of A , and AX(s) — X ( 5 ) — X (s— ). 

Theorem 3.44 (Levy decomposition) If X is a Levy process, then it admits the 
decomposition 


X(t) = B(t)+ / y(y/(t,dy)-tX L (dy)) + pt+ V AX(s)l(| AY(s)| > 1), 

o<s<? 

(3.52) 

where B is a Brownian motion, is a random measure, Xi is a measure o«R\ {0} 
such that Jmin(l ,y 2 )Xi(dy) < 00 , N A (t ) = J A ^(t, dy) is a Poisson process 
with intensity Xi (/l) for any Borel set A , 0 f A such that, if T is another Borel set 
with the properties 0 f T and r fl A = 0, then N A and N r are independent, N A 
is independent ofB, and f = L[X(1) — /^(l)] e M ([6], Theorem 42, p. 32). 

Theorem 3.45 (Levy-Khintchine formula) A Levy process X is defined by its char- 
acteristic function that has the expression (p{u\ t) = E[e luX ^] = where 
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/ 

J\y 


M>1 


(1 -j uy )k L {dy), 
(3.53) 


and Xi, o 2 , and P need to be specified. The parameters Xi, a 2 , and P define a 
Levy process uniquely in distribution ([6], Theorem 43, p. 32). 

The approximate representation for a Levy process X(t) has two components. The 
first is the process Xo<j<r AX (j)l(|Z\2f (s)\ > a) defined by the jumps of X(t) with 
magnitude exceeding a > 0, and is represented exactly. The second is the process 
Z a (t ) in (3.51), and is described approximately by a scaled Brownian motion with 
scaling factor depending on a. 

Example 3.56 Let X(t), t > 0, be a Levy process with characteristic function 
tp(u\ t) = E[e ,uX ^] = exp(— t\u\ a ), where a e (0, 2], Figure3.7 shows a sam- 
ple of X and a sample of a Brownian motion B and the corresponding samples of 
the quadratic variation processes \B\ and [X], The sample of X exhibits jump dis- 
continuities which are emphasized in the sample of [X]. The steady increase of [X] 
resembling [fi] is marked by jumps that are typical to the quadratic variation of a 
compound Poisson process. O 

LetX = L a , 0 < a < 2, be a symmetric a -stable process, that is, a Levy process 
with characteristic function 


<p(u; t) = E[e iuLa0) ] = exp(— f|nD, t > 0, ueR, (3.54) 


given by (3.53) with a — 0, P = 0. and the Levy measure 


X L (dy) = -^\yr (a+1) dy, yel\|0), 


(3.55) 


where c a = (1 — a)/[L( 2 — a) cos(7ra/2)] for a ^ 1 and c a =2 /tt fora = 1 ([1], 
Property 1.2.15) The definition in (3.54) implies that the increments dL a (t ) of L u 
are symmetric a-stable random variables with scale idt) ]//a centered at 0. 

There are notable similarities and differences between compound Poisson and 
a-stable processes. Let (dt. dy) be a random measure defining the number of 
jumps in the infinitesimal rectangle (f, t + At] x (y, y + dy]. The expectation of 
,/PP (dt, dy) is X dt clF(y) for the compound Poisson process C in (3.48) and /./ (dy) dt 
for an a-stable process L a , where F denotes the distribution of the jumps { K/,} of 
C and Xi is the Levy measure. Let A be an arbitrary Borel set in R. The integral 
(1 /dt) f A .//X(dt . dy) dy gives the average number of jumps in A per unit of time. 
This number is X f A dF(y) < X < oo for C but can be finite or not for L u depending 
on A , for example, 



(.-a, a) 


x L(dy ) = OO, 


(3.56) 


106 


3 Random Functions 


Brownian motion Levy process (a=1 .5) 






Fig. 3.7 Samples of a Brownian motion and a Levy process and of corresponding quadratic 
variations 


that is, the average number of jumps of L a with magnitude larger and smaller than 
a > 0 is finite and unbounded, respectively. 

Consider the process Xo<s<r ■^a( i )l(l^£c>'(s)l > a) in (3.51), that is, the 
compound Poisson process 


N a (t) 

C a , a (t ) = X Ya ’ k ’ 1 - °’ 

k=\ 


(3.57) 


where {Y a k} denote the jumps of L a with magnitude larger than a > 0 and N,< 
is a Poisson counting process with intensity X a = c a /a a . Recall that {Y a j.) are 
independent identically distributed random variables. The distribution of Y a , \ with 
Xl in (3.55) is 


F a (y) = 
aa a 

~ ~Y 


f(—a,a) c n(—oo,y) ^k(dz) 
!(—a,a) c X L{dz) 


~ (a+l) dz 


' (—a,a) c C\(—oo,y) 


1 . , 1 , . 1 - (a/yf , ' 

— 1 y < —a) + —l(y > -a) + (y > a) 

a\y\ a aa a a a 01 


so that the density and the characteristic functions of Y a \ are 


jeR, 

(3.58) 


aa 

fa(y) = ~y~ 


|y|- ( “ +1) l(y < -a) +y“ ( “ +1) l(y > a) 


y e 


(3.59) 


and 
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rOO 

(p a (u) = aa 01 / cos (uy)y~ {a+1) dy, 
J a 


(3.60) 


respectively. The asymptotic behavior of the distribution and density functions in 
(3.58) and (3.59) as y -* oo is y~ a and -y-fc+O, respectively. 

Let L a a be a stochastic process obtained from L a by excluding the jumps of L a 
with magnitude larger than a > 0, that is, the process 


Lc/,a(, 0 — Ca,a(t) 


(3.61) 


with C UJ1 in (3.57). Note that (1) L a a and C aM are independent Levy processes ([6], 
Theorem 39, p. 30), (2) L a a has finite absolute moments of any order since it is a 
Levy process with bounded jumps ([6], Theorem 34, p. 25), (3) L a a has mean 0 
since its density is an even function, (4) the variance of L a a (t) is linear in t since 
L a ,a(t) has stationary independent increments, and (5) the approximation 

L a (t) — L a a {t) = a(a, a)B(t) + C a ^ a {t), t > 0, (3.62) 

holds in the sense that a {a, a) -1 ( L a (t ) — converges weakly to a standard 

Brownian motion as a — > 0 in D[0, 1] under the topology induced by the uniform 
metric if and only if for each k > Owe have a (a, ko{ci) A a) ~ cr (a, a) as a — > 0, 
where 

a(a,a) 2 = E[L a . a ( l) 2 ] = T y 2 X L (dy) = c a a 2 ~ a , (3.63) 

J —a 2. a 


B is a standard Brownian motion, and Z)[0, 1] denotes the space of real-valued right 
continuous functions with left limits [27] . 

The characteristic function of L a a (t) = er(oi, a)B(t) + C a ^ a {t) is 


<pa,a(u\ t) = exp 


(na (a, a)) 2 t 
2 


- X a t(l - <p a (u)) 


(3.64) 


where k a = c a /a a and (p u is given by (3.60). Figure 3.8 shows the characteristic 
functions q> a (u), u > 0, for a = 1, a = 0.1 (left panel), and a = 1 (right 
panel). The difference between these functions is significant. The solid and dotted 
lines in Fig. 3.9 are the characteristic functions of L a (t) and L a a (t), respectively, 
for a = 1, t = 1, a = 0.1 (left panel), and a = 1 (right panel). Although the 
characteristic functions (p a corresponding to different values of a differ significantly 
(Fig. 3.8), the approximate characteristic functions of L a depend weakly on a and 
are accurate. The characteristic function of L a a {t ) for a = 0.1 coincides with that 
of L a (?) at the figure scale. 


3.7.6.4 White Noise Processes 

Brownian motion and compound Poisson processes are not m.s. differentiable since 
their covariance functions at arbitrary times s and t are proportional to s A t. The 
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Fig. 3.8 Characteristic functions tp a in (3.60) with a = 1 for a = 0.1 {left panel) and a = 1 {right 
panel ) 




Fig. 3.9 Characteristic functions of L a (?) ( solid lines ) and L a ^ a (?) {dotted lines) for a = 1 , ? = 1 , 
a = 0. 1 {left panel), and a = 1 {right panel) 


covariance function of compound Poisson processes exist if their jumps have finite 
variance. The samples of the Levy process are also too rough to be differentiable. 
Yet, it is common in the applied literature to interpret the derivative of the Brownian 
motion, compound Poisson, and Levy processes as white noise processes, and refer to 
them as Gaussian, Poisson, and Levy white noise processes, respectively. We denote 
these processes by 

i 

Wo(t) = dB(t)/dt (Gaussian white noise), 

i 

W P (t) = dC(t)/dt (Poisson white noise), and 

Wp(t) = dL a {t)/dt (Levy white noise). (3.65) 

Since Gaussian, Poisson, and Levy white noise processes do not exist, calculations 
with these processes are formal so that results obtained by these calculations are 
questionable. 
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In the following chapters we will see how white noise processes can be incorpo- 
rated rigorously in the theory of stochastic differential equations and develop practical 
methods for solving these equations. Relationships between white and physical or 
colored noise processes will also be examined. 

We conclude this brief section with the observation that, while samples of 
Gaussian and Levy white noise processes cannot be visualized, samples of Pois- 
son white noise can be drawn. They are sequences of iid random variables {Y n } 
arriving at the jump times { T n } of a Poisson process N, that is, 


N(t) 



(3.66) 


where <5(-) denotes the Dirac delta function. 


3.8 Monte Carlo Simulation 

Our discussion is limited to Monte Carlo simulation methods for generating 
samples of stationary Gaussian random functions, translation vector processes, and 
real-valued non-stationary Gaussian processes based on the spectral representation 
theorem and an extension of this theorem. These and other topics on Monte Carlo 
simulation are discussed extensively in [5] (Chap. 5), [28, 29], 


3.8.1 Stationary Gaussian Random Functions 

We (1) construct sequences of random functions depending on finite collections of 
random variables that approach in some sense target Gaussian/translation random 
functions and (2) develop algorithms for generating samples of these random func- 
tions. Developments are based on the spectral representation in Sect. 3.6.4 showing 
that weakly stationary random functions can be viewed as superpositions of harmon- 
ics with random amplitudes. 

3.8.1.1 Stochastic Processes 

Let X(t) be an Revalued stationary Gaussian process with mean zero, covariance 
function c(r) = E[X(t + r)X(t)'], and spectral density s(v) = {^/(v)}, k,l = 
1, . . ., d , with support [— v, v], 0 < v < oo. Consider a partition p„ = (0 = «o < 
a\ < • • • < a„ = v) of frequency band [0, v], and set Av r = a r — a r - 1 and 
v,- = (a r - 1 + a r )l 2, r = 1 ..... n. It is common to use equal frequency intervals in 
which case Av r = v/n and v r = (r — l/2)v/n , r — 1, . . n. If the support of s(v) 
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is not bounded, a cutoff frequency 0 < v < oo needs to be selected such that most 
of the energy of the process is included in the frequency band [0, v]. 

Let A r and />', be -valued Gaussian variables with mean zero and second 
moments 


pCUr 

E[A r kA p l\ = E[Br t kB p j] = S rp / gk,l(v)dv ~ 8 rp gk,l(v r )Av r , 
J OC r — 1 


nOLr 

E[A r , k B p j] = - E[B r<k A p j ] = 8 rp / h k j(v)dv ~ 8 rp h k j(v r )Av r , (3.67) 

J &r—\ 


where g(v) = s(v) + s(— v) and h(v) = — i(s(y ) — s(—v)) (Theorem 3.19). The 
approximations in (3.67) are satisfactory if the functions g k j(v) and h k j(v) are 
nearly constant in the intervals (a, -i, «/-)■ 

Theorem 3.46 If A ( p n ) = maxi< r <„ (a,. — ot r ~i) — > 0 and v — > oo, then 


n 

X^ n \t) = ^ [ A r cos (y r t) + B r sin(v r f)] 

r=l 

becomes a version of the stationary Gaussian process X(t). 

Proof Since X (n) has mean zero and covariance function with entries 


E[xj n \s)xf\t)\ = Y, 


gi,j(y)dv 


_| l L JKr-l 


cos(v r (t — s)) 


hi,j (v) dv 


sin(v r (t — s)) |, i,j = l,,..,d. 


(3.68) 


it is weakly stationary for each n. The process X u 'Ht) is Gaussian as a linear form 
of the Gaussian variables A r and B r . If d = 1, then g\j = g, h\,\ = 0, and 
£[J[W(j)I<")(0] - Z"=t S(^dv\ cos (v r (t - .v)). Since 


lmi^£[xf !) (s)Zy !) (t)] = J gij(v) cos(v(t—s))—hjj(v) sin(v(f — s)) 


dv 


for a fixed v, the covariance function of X (ll> converges to that of L by letting v —> oo 
in case the frequency band of X is not bounded. Since X (n< and X are Gaussian 
processes, X in> becomes a version of X under these limits. ▲ 

The statement in Theorem 3.46 justifies the use of samples of X ^ as a substitute 
for samples of X. Additional arguments on the validity of this approximation can be 
found in [28](Chap.4) and [29] (Chaps. 1,2, and 3). 

The generation of samples of X (lli in a time interval [0, r], r > 0, involves 
the following three steps. First, select a cutoff frequency 0 < v < oo in case the 
frequency band of X is unbounded, partition the frequency band [0, v], and specify 
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*lC t) 


Xi (t) 



x 2 (t) x 2 (t) 



Fig. 3.10 Samples of the coordinates Xi and X 2 of X for p = 0.3 ( left panel) and p = 0.7 ( right 
panel ) 


discrete frequencies v r , r = I n. Second, generate samples of the Gaussian 

random variables A r and B r , r = 1 with mean zero and second moments in 
(3.67). Third, calculate samples of X^ n \t) in [0, r] from (3.68). 

Note that the samples of X ( ")(f) are periodic with period 2jt / v\ if Av r = v/n 
and v r — (r — 1/2 )Av r and infinitely differentiable even if the samples of X are 
not differentiable. For example, the samples of X^ n \t) for a stationary Gaussian 
process X{t ) with exponential correlation are infinitely differentiable although X(t) 
is not mean square differentiable. 

Example 3.57 Let X(t) be an Revalued stationary Gaussian process with spectral 
density 


Sk,l(v) = (1 - p)8 k ,s k (v) + ps z (v) k, l = 1,2, 

where s k (v) = 1 /(2v*) 1 ( — v* < v < v k ), sz(v) = l/(2v)l(— v < v < v), and 0 < 
v k , v < oo. Figure3.10 shows samples of the coordinates of X for v k = 25, v = 5, 
Av r = max(vi,V 2 , v)/n, v r = (r — 1/2 )Av r , r = n = 100, and two 

values of p . The frequency content of X(t) depends strongly on p . The coordinates 
X i and X 2 of X are nearly in phase for values of p close to unity. O 
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3.8.1.2 Random Fields 

Let X(t), t e R r/ , he a real-valued, homogeneous Gaussian field with mean zero, 
covariance function c(r) = E[X(t + r)X(t)], and spectral density ,?(v), where 
t, x e are spatial coordinates and v e R rf is a frequency. Our discussion is 
limited to real-valued random fields. Extension to vector-valued random fields can 
be based on arguments similar to those for vector processes. 

Let D C be the support of the spectral density s (v) of X assumed to be a 
bounded set. We ( i ) take D to he a bounded rectangle centered at the origin of R^ , that 
is, D = XjLj[— Vk, v k ], vie > 0, (ii) partition D in rectangles D r , r = 1, . . ., d', 
defined by a grid with step Avk = Vk/nk- k — I . where rif, > I are integers, 

and (iii) select v r at the center of D r . Let A r and B r , r = 1, . . n, be independent 
Gaussian variables with mean zero and variance 

Yr = E[A 2 r ] = E[B~] = [ s(v) dv. (3.69) 

J D r 


Theorem 3.47 The sequence 


n 

X in \t) = ^ [A r cosftv ■ t ) + B, sin(v r • t )} , (3.70) 

r = 1 

becomes a version of X in the limit as maxi< r <„{k(D, )} — > 0, n — > oo, and 
mini <k<d'{h} -> oo, where v r ■ t = Y?k= 1 v rktk, v r = (v r ,i, . . ., v r ,d'), MD r ) 
denotes the Lebesgue measure of D r , and t = (?i, . . ., t#). 

Proof The sequence of random fields X tn> are Gaussian with mean zero and covari- 
ance functions 

E[X w (s)X {n '>(t)] 

n 

= ^ £ , [(A, cos(v r • j) + B r sin(v r • s)) [A p cos(v p ■ t) + B p sin(v / , • f))] 
r,P= 1 

n n 

— ^ y,-(cos(v r ■ ^)cos(v r • t) + sin(v r • ^)sin(v r • r)) = ^ y r cos(v r ■ ( t — s)), 

r=l r= 1 


so that they are homogeneous for each n > 1. Considerations as in the proof of 
Theorem 3.46 show that £[X ( "^(t)X ( '^(s)] converges to the covariance function of 
X as n — > oo and the boundaries of D are extended to Sf , in case D is not bounded. 
Hence, X^ n> becomes a version of X as the partition of D is refined and the frequency 
band is increased indefinitely. ▲ 

The generation of samples of X (n> in a subset S of R^ involves the following three 

steps. First, specify cutoff frequencies v* > 0, k = 1 d' . in case the frequency 

band of X is unbounded, partition D = xf =1 [— v*, Lt | in n rectangles, and select 
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Fig. 3.11 Samples of X for two discrete spectral densities 


interior points v r , r = 1, . . ., n, in these rectangles. Second, generate samples of the 
Gaussian random variables A r and B r , r = 1 ,...,«, with mean zero and second 
moments in (3.69). Third, calculate samples of X (n) it), t e S, from (3.70). 

Example 3.58 Let X(t), t e R 2 . be a real-valued homogeneous Gaussian field 
defined by 


n 


X(t) = Tl A r cos(v r • t ) + B r sin(v r ■ f)], 


r = 1 


where A,- and B, are independent Gaussian variables with mean zero and vari- 
ance E[Af.] = E[B?] = y r • Since the field X(t) has the form in (3.70), it can be 
used directly to produce samples. Figure 3.1 1 shows two samples of X for n = 6, 
vi = (1, 2), V2 = (2, 1), V3 = (2, 2), V4 = — vi, V5 = — V2, and V6 = — V3. Recall 
that spectral densities s (v) of real-valued homogeneous random fields must satisfy the 
condition s(v ) = s(—v) (Theorem 3.8). The left sample is for y r — 1, r = 1, . . ., 6. 
The right sample corresponds to y\ = 3/4 = 1 and y r = 0.01 for c / 1,4. and has a 
dominant wave with frequency vj = (1, 2). O 


3.8.2 Translation Vector Processes 

Our objectives are to (1) construct translation models X (T> (t ) for -valued station- 
ary non-Gaussian processes X(t) that match both the marginal distributions and the 
covariance functions of these processes and (2) develop a Monte Carlo algorithm for 
generating samples of X i T] (t). Translation models were introduced in Sect. 3.7.2. 

The first objective cannot always be achieved since translation models with the 
required properties may not exist ([4], Sect. 3.1.1), in which case we have three 
options [16]: match the marginal distributions and approximate the covariance func- 
tion, approximate the marginal distributions and match the covariance function, and 
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approximate both the marginal distributions and the covariance functions. We discuss 
the first option. 

Let G(t) be an Revalued weakly stationary process whose coordinates Gk(t), 
k = 1, . . d, have mean 0 , variance 1, covariance functions Pkfr) = E[Gk(t + 
r)Gft)\, and spectral densities Skfv), k, l = 1, . . ., n. The covariance and spectral 
density functions are related by (Theorem 3.7) 

/ OO 1 fOO 

e ivz s k ,fv)dv and s k ,fv) = — / e~ ivz p k ,fr) dr. (3.71) 

-OO J — OO 

Theorem 3.48 The spectral densities {.?£,/ (v)} in (3.71) are such that (1) Sk,k(v), 
v G 1, are real-valued, even, positive functions, (2) the matrix {skj (v) } of spectral 
densities is Hermitian, that is, Skj(v) = sik(v)* for all v e R, (3) Sk.i (v) = Sk.f— v)*, 
(4) k-,/(v)| 2 < s k ,k(y)si,i(v), and (5) s k . k (v) + s u (v) + 29t|> jM ](v) > 0 , k / /. 

Proof The first three properties and (3.71) follow from Theorem 3.17. For exam- 
ple, s k j(y) = e~ ,vz pk.fr) dr / (2 tt) = f~°° e‘ vz pk.f-r)(-dr)/(2ir) by the 

change of variable r i->- — r so that Sk,i(v ) = e ,vz p/.k(r) dr/(2ir) — si. k (v)* 
since Pk.fr) = />/,*(- r). 

The proof of the fourth property can be found in [8] (Sect. 8. 1) or [5] (Sect. 3. 7. 2.4). 
We prove the fifth property. The real- valued and complex-valued processes Y\ (?) = 
Gk(t) + Gift) and K 2 (f) = iGk(t ) + Gft) are weakly stationary with mean 0 and 
covariance functions 

cfr) = E[Y] (t + r)Y\ (?)*] = Pk,k(r) + pi.fr) + Pk.fr) + Pi,k(j) and 
c 2 (r) = E[Y 2 (t + r)Y 2 (t)*] = Pk.kir) + Pi, ft) + ipk.fr) - ipi.k(r), 

respectively. Let s p (v) = j e~ lvz c p (r) dr /(2 tc) denote the spectral density func- 
tions of Y p (t), p = 1,2. We have 

•si(v) = — j e ,VT [pk.k(r) + pi.fr) + Pk.fr) + Pi.fr)] dr 
= Sk.k(v) + Si.fv) + Sk.fv) + si.k(v) 

by (3.71), so that 5i(v) = s k . k (v) + si.fv) + 2!H[^,/](v) by property (3). Since 
Y\ ( t ) is a real-valued process, its spectral density must be positive, that is, Sk,k(v) + 
si.fv) + 231[.s> i /](v) > 0 for all veR. This inequality provides constraints on the 
possible values of spectral densities Sk.fv) relative to those of Sk.k(v) and si.fv). The 
spectral density of the complex-valued stochastic process Y 2 (t) is .v 2 (v) = Sk.k(y) + 
si.fv) + i (sk.fv) - si.fv)) = Sk.k(y) + si.fv) - 23|> w ](v), v e R, and provides 
no additional constraints for the spectral densities of G(t). ▲ 

Let X(t) be an R -valued non-Gaussian stationary process whose coordinates 
Xft) have means E[Xk(t)] = 0 , variances Wax[Xft)~\ = E[Xft) 2 ) = 1, dis- 
tributions Fk(x) = P(Xk(t) < x), and covariance functions %k,fr) = E[X k (t + 
x)Xftf\, k,l = 1, . . ., d. The assumption that the coordinates of AT?) have zero 
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means and unit variances is not restrictive since any stationary process can be mod- 
ified to have these properties by shifting and scaling. 

Let X (T ^ (r) be an Revalued translation model for X(t ) defined by 

X ( k T \t) = F~ l o<t>(G k (t)) = h k (G k (t)), k=l,...,d, (3.72) 

where denotes the distribution of N( 0, 1), h k = Fj ~ 1 o <t>, and {67(f)} are 
the coordinates of an R^-valued stationary Gaussian process G(t) with E[G k (t)] = 
0, E[G k (t) 2 ] = 1, Pk.iir) = E[G k (t + r)G/(f)], and 5*7 (v) = f™ 00 e~ ,VT p k j( r) 
dr/(2jt ), k,l = 1 ,...,d. 

The marginal distributions of X^ T] {t) coincide with those of X(t) irrespective of 
the spectral densities {^./(v)} or equivalently, the covariance functions p k j( r), of 
G(t) since PiXj k {t) < x) = P(G k (t) < @~ l o F k (x )) = F k {x), k — 1, . . ., d. 
On the other hand, the covariance functions, 


$U ] (T) = E[X ( k T \ t + r)x\ T \t)] = E[h k (G k (t + r ))h,(G,{t))] 


= / h k {u)hi(y)cj){u, v; p k i(r)) du dv, 


(3.73) 


of X cl 1 (t) depend on both the second moment properties of G(t) and the mapping 
in (3.72). The function cp(u, v; p) in (3.73) denotes the joint density of a standard 
bivariate Gaussian vector with correlation coefficient p. 

Consider the case in which there is no translation model matching the marginal 
distributions and covariance functions of a target non-Gaussian vector process X(t). 
We construct a translation process X ri 1 (t) that has the same marginal distributions 
as X(t). The coordinates {67(0} of the Gaussian image Git) of X {T) (t) have mean 
0, variance 1, and their covariance functions are such that the covariance functions 
of X [1 \t) are as close as possible to those of X{t) in some sense. 

Suppose the spectral densities {s k j(v)} of 6(f) have non-zero ordinates in a fre- 
quency band (— v, v), 0 < v < oo. Let n > 1 be an integer, Av = v/n, vy = 
(r — 1/2 )Av and v_ r = — v r for r = 1, . . ., n, and approximate {i 1 *,/ (v) } by the 
discrete spectral densities 

n 

h,l(v) = X 47^ -Vr), (3.74) 

r=—n,r^ 0 

where {^, /} and {s/ ; } are unknown spectral ordinates associated with frequencies 
[v r ] and {v_ r }. Note that {s k j(v)) must have the properties stated in Theorem 3.48 
If the spectral densities [s k j(v)} satisfy the conditions of this theorem, then 


Pkjir) = / e lVT s k i(v) dv — / 

J— oo J — OO 

= i e^} = 2± 

r=—n,r^0 r= 1 


- ,VZ X! S k) S( ~ V - V r)dv 

r=—n,r^ 0 

91 [ 4 - 7 ] cos(v,.r) - S[4y] sin(v r r) 


(3.75) 
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~ (k') 

are covariance functions. Since lp k jir)} depend only on the ordinates {^. j] of the 
discrete spectra of G(t), so are the covariances {|^ ) (r) = E[X^\t + r)X / <r, (f)]} 
of the corresponding translation model X ( l ] (t) of X(t) defined by 

X?\t) = F~ l o <P(G k (t)) = h k (G k (t)), k= 1 d, (3.76) 

where G(t ) = (Gi(t), . . Gd(t )) is an K d -valued stationary Gaussian process with 
covariance functions {p k j( t)J and spectra {5*,/(v)}. Let 

*vect = {42- */?. ^ S kjl k,l=l,...,d,r = l,...,n} (3.77) 

be a vector containing the unknown parameters in the definition of Git). 

Theorem 3.49 The optimal Gaussian process Git) is given by s vect hi (3.77) that 
minimizes the objective function 


e(Jvect) = max 

r>0 


i (iffw-fww) 2 ] 

*,/= i ^ j 


(3.78) 


under the constraints 


3>«, 2>S = ia i4?i 2 <« 


and 


<* + $ +2SR[</]>°, M*. 

/or all k. I — 1 , . . . , d and r — 


(3.79) 


~ / J”) 

The calculation of the covariance functions ££ ; (r) of the translation model 
X , r> (t) with Gaussian image G(r) involves numerical evaluations of double inte- 
grals, which may render the optimization algorithm in (3.78) and (3.79) unfeasible. 
This difficulty can be resolved by precalculating and storing the mappings p i->- £ 
between correlation coefficients of standard bivariate Gaussian vectors ( G k , G/) 
and those of bivariate non-Gaussian vectors ( FT 1 o 0 ( G k ) , FT 1 o 0 ( G / ) ) . These 
mappings are monotonically increasing ([4], Sect.3.1.1) and can be used to find 
covariances G /( r) from p k ,i(r) and evaluate the objective function in (3.78). 

Once a translation model X (T Ht) has been constructed, the following algorithm 
can be used for sample generation. First, specify a cutoff frequency v > 0 for the 
coordinates of Git) and select discrete frequencies in the frequency band considered 
for G(t). Second, generate samples of G(t) defined by 


n 

G(t) = ^ C r e iVrt , 

r=—n,r^ 0 


(3.80) 
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where the coefficients {C r } are (C -valued Gaussian variables such that F[G(t)] = 0 
and E[Gk(s)G(t)] = Pkiit — s ) (Example 3.15). The generation of samples of G(t) 
can be based on the algorithm outlined in Sect. 3. 8. 1.2 using the representation in 
(3.68), which is an alternative form of (3.80). 

Example 3.59 Let G(t) = (G \ (f), G 2 (f)) be a bivariate stationary Gaussian process 
defined by Gu(t) = \/\ — 9Zk(t) + V0Z(t), k = 1,2, where 0 e [0, 1] and 
Z k (t), k = 1,2, and Z(f) are independent stationary Gaussian processes with zero 
means and spectral densities sz k (y) = 1 (~vz k < v < vz k ) / (Zvz k ), 0 < vz k < oo, 
and sz(v) — 1(— vz < v < vz)/( 2vz), 0 < vz < oo. The spectral densities and 
covariance functions of G(f) are 


SkA v ) = (* 

Pk.ii r) = (1 - 9)8ki 


9)his Zk (y) + 6s z (v) and 
sin(v z *T) sin(v z r) 


VZ k 1 


(3.81) 


Define an Revalued translation non-Gaussian process X {1 ’(f) by (3.72) with 

Fi(x) = <£( In (x — a)/cr), a,a> 0, x > a, 

F 2 (x) = 0( sign(x)|x| 1/3 ). x € !, (3.82) 


and suppose that X(t) = X ' 1 1 (t) is the target non-Gaussian process. 

The covariance functions of the coordinates of X(t ) are related by ([4], Sect. 3.1.1) 


£l,l(r) = (1 - /' ,1 ' lW )/(l - e° 2 ) and 

?2.2(r) = ^P2,2(t)(3 + 2p 2 , 2 (r) 2 ). (3.83) 


The relationship between pi, 2 (r) and i . 2 (r ) cannot be obtained analytically. 
Figure 3. 12 shows an estimate of this relationship calculated by Monte Carlo sim- 
ulation from (3.73) using 10 6 samples of bivariate standard Gaussian vectors with 
correlation coefficients p\p € [— 1, 1]. 

Consider a discrete representation s^j (v) of the spectral densities Skj (v) of G(t ) 
as given by (3.74). The representation depends on the spectral ordinates s k j. e R, 


k= 1,2, St[s^] e R, and 3|>^] e R, r = 1, . . ., n. Let, as in (3.77), 




^ect = K‘1, . . 41- 41 41- ^41]- • • - »u41i, s[4'4, ■ . st44]} 

(3.84) 


be a vector in R 4 " including the unknown parameters in the discrete version of the 
spectral densities of G(f). Our objective is to find an optimal s ve ct in the sense that 
it minimizes the objective function in (3.78). The covariance functions ^ t (r) in 
(3.78) are images of the covariance functions Pk.ii r) given by (3.75), and can be 
obtained from (3.78) and Fig. 3.12. 
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Fig. 3.12 An estimate of the relationship between correlation coefficients pi _2 of (Gi , G 2 ) and f ip 
of (F^0(G l ),F^0{G 2 )) 




Fig. 3.13 Target and optimal covariance functions in the Gaussian space, p k ,i (t) and Pk,l( T ) 

Numerical results have been obtained for vz k = 10, k = 1, 2, vz — 5, n=15, 
a = 1, a = 1, and 0 = 0.5. Figure 3. 13 shows with solid and dotted lines the 
target covariance functions p^ji r) and the optimal covariance functions Pkj(r) in 
the Gaussian space. Target and translation covariance functions, (r ) and ^ ; (r), 
are shown with solid and dotted lines in Fig. 3.14. The optimal covariance functions 
nearly coincide with the target covariance functions, an expected result since X(t) is 
a translation process so that it can be mapped exactly into a Gaussian process. O 

The discrete approximation of the spectral densities (v)} of G(t) in (3.74) is 
feasible for stochastic processes. However, it is impractical for random fields defined 
on d > 2, since the dimension of the vector .v vcc t of unknown parameters in 
(3.77) is excessive. For translation random fields, it is convenient to approximate 
the spectral densities of their Gaussian images by finite sums of specified functions 
weighted by unknown coefficients. The resulting representations of the spectral den- 
sities must satisfy conditions as those in Theorem 3.48. 
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Fig.3.14 Target and optimal covariance functions in the translation space, £^'(t) and f ^/( t ) 


3.8.3 N on-Stationary Gaussian Processes 

Let X be a real-valued non-stationary Gaussian process with mean 0, covariance 
function c(s, t ) = £’[Z(5)X(t)], and generalized spectral density s(v, 77), where 
([30], Sect. 12-2) 


s(v,rj)= » [ c(s, t)e ' {ys n ^dsdt 

(2?r Y J K 2 

c(s,t) = [ s(v, ij)e l ^ vs ~ l,t ^dv dr). ( 3 . 85 ) 

Jr 2 

We construct approximations for the covariance function c(s,t) by (1) truncating 
the frequency band of this process, that is, replacing ,?(v, 77) with s(v, 77) 1 ((v, 77) e 
7(v, ?])), where I(v, 77) = [— v, v] x [ — 77, 77], 0 < v, 77 < 00, is a bounded rec- 
tangle such that -j s(v, ?i)e l ^ v ~ 1, - r dv drj — / R2 s(y, dv dr] = c(t, t) at 

all times and (2) partitioning /(v, fj) in sufficiently small subsets such that ,v(v, 77) is 
nearly constant in each of these subsets. 

Resulting approximations of c(s,t) are used to construct a sequence of processes 
converging to X. Denote by c(s, t) an approximation of c(s,t) corresponding to the 
truncated spectral density s(v, 77) 1 ((v, 77) e I(v, 77)), that is, 

c(s,t)= [ siv.rie^-^dvdr,, (s, t) e M 2 . ( 3 . 86 ) 

3 1 (V, »j) 

Let /( f , p ) = [— f , f] x [— p, p] e R 2 , 0 < f , p < 00, be a bounded rectangle in 
the space of temporal coordinates, and approximate ,v(v. 77) by 

j(v,,7)= * / c(s,t)e~ Kvs -’ lt) dsdt , (v, 77) e K 2 . ( 3 . 87 ) 

(2tt) 2 7/ ( r,p) 
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Theorem 3.50 If c(s, t) is continuous, has bounded partial derivatives 3c(s, t)/ds 
and 3c(.j, t)/dt , and the mixed partial derivative 3 2 c(s, t)/dsdt exists in J (r . p), 
then the Fourier series ofc( s,t) restricted to the bounded rectangle J (f , p) converges 
to c (s,t) at all interior points of this rectangle. 

Proof The proof can be found in [31], Sect. 7.3. The meaning of this theorem is that 
the partial sums, 


p q 

<r P , q (s,t)= X Z tk,ie l7l(kslT - ltlp) , where 

k——p l=—q 

&/=— f c(s, t) e - i7T<ks/i - lt/f>) ds dt, (3.88) 

4rp 

of the double Fourier series of c(s,t) restricted to /( f , p) converge to c(.v, t) as 
p, q — > oo at all interior points of J(f, p). Hence, the error | G p q {s, t) — c(s, r)| of 
the approximation a pq {s, t ) of c(s,t) can be made as small as desired in J(f , p) by 
increasing p and q. A 

Theorem 3.51 If s(v, q) is absolutely integrable in R 2 , then 


|c(s, t ) — c(s , t ) | = 


f s(v,q)e l ^ vs ^ dv dq < f |s(v, q)\dv dq, 

J I (v,rj) c JI(v,fj) c 


(3.89) 

so that the approximate covariance function c(s, t ) of X(t) converges to c(s,t) as 
v, q — »■ oo. If c{s,t) is absolutely integrable in K 2 , then 

|j(v, q) - s(v, q)\ = 7T / c{s,t)e~‘ (vs ~ ll,) ds dt 

(2ttY J j(z,p) c 


(2 it) 2 


I J(t,p) c 


|c(^, t)\ds dt 


(3.90) 


so thats(v, q) converges to s(v, q) as x , p — »■ oo. 

Proof The bound on the discrepancy between c(s, t) and cis, t) in (3.89) implies 
limy jj^oo c(s, t ) = c(s, t ) at each ( s,t ) since |s (v, q) \ is integrable in K 2 by assump- 
tion. The requirement that s(v, q) is absolutely integrable in R 2 is stronger than 
c(t, t) < oo at all times since c(f, t ) = | J k2 j(v, q)e ,( - v ~' 1 ^ dv dq | < J R2 |s(v, q)\ 
dvdq. For the special case in which sly, q) has a bounded support, the function 
c(s, t ) coincides with the target covariance function c(s, t) provided /(v, q) includes 
the support of s(v, q). 

The bound on |s(v, q) — . v ( v , ;;) in (3.90) implies the stated convergence under 
the assumption that c{s,t) is absolutely integrable in M 2 . A 

Let 7(v, q) be a bounded rectangle as in (3.86) that may or may not include 
the support of s(v, q). For arbitrary integers m,n > 1, set Av = v/m, Aq = 
q/n, Vjt = kAv for k = 0, 1, . . ., m, V-k = — Vk for k = 1,2,.. ., m, qi = lAq 
for l = 0, 1, . . ., n, q-t = —qi for l — and 
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I 0 0 = [-Av/2, Av/2) x [-Ari/ 2, Aq/ 2), 

/j o — [Vi — Av/2, v k + Av/2 ) x [— Aq/2, Aq/2), k e {— m, . . ., —1, 1, . . m] 
Iq I — [— Av/2, Av/2) x [tj/ — At]/ 2, 77 / + Aq/2), 1 e {— n, ...,—1,1,..., n } 

4,1 = [v* - Av/2, v k + Av/2) x [?» - zltj/2, tj, + Z\t;/2), 

(£, /) e {— m, . . ., —1. 1, . . ., m) x {— n —1, 1, . . ., ;?}. (3.91) 

The rectangles { I k j } partition I (v, ij), that is, they are disjoint sets such that / (v, ij) = 
u£l_ m U "__ (J I k ,u so that c(s, f) in (3.86) becomes 

m n „ 

c(s,t) = ^ ^ / s{y,q)e l ^' s ~ r}t ^dvdq, (3.92) 

k=—m l=—n 

and can be approximated by 


m n 

CmAs,t)= £ Z s k,l ei(ykS ~ mt) , (3.93) 

k=—m l=—n 

where s k j = Jj s(v, rf) dvdq ~ ^ (v*, iy)AvAq. The latter approximation holds 
if ,v(v, q) is nearly constant in I k ,u Note that c mpl (s, t ) in (3.93) corresponds to 
the discrete approximation Xfc=_ m X!"=- n s k,l$(y ~ v k )S(v — v/) of the truncated 
generalized spectral density ,y(v, 77 ) 1 ((v, rf) e I(v, ij)). 

Theorem 3.52 If c(s,t) and ,v(y, rf) satisfy the conditions in Theorems 3.50 and 
3.51, and s(v, rf) is continuous, the discrepancy \c npn (s, t) — o pq (s, t)\ between 
c m .n(s, t) and cr p . q (s, t) can be made arbitrarily small in J(f, p) for sufficiently 
large truncation levels in both frequency and time domains as m,n 00 and 
p , q — > 00 . 

Proof The approximations c mpl (s, t) and o pq (s, t) of c(s,t) correspond to trunca- 
tions of the generalized spectral density s(v, if) and of the covariance function c(s,t), 
respectively. For arbitrary but fixed I (v, ij) and J(x, p) we have 

I C m ,n{s, t) - Op.qis, t)\ < \c npn (s, t) - c(s, t)\ + |c(j, t) - c(s, t)\ 

+ |c(^, t) - c(s, r)l((s, t) e J( f , p))| + |c(s, t)l((i, t) e J{r, p)) - cr p , q (s, t)\. 

The first term can be bounded by 

m n „ 

\c m ,n(s,t) - c(s,t)\ < |s( y , q) \ \e‘ (vs ~ l,,) - e l(VkS ~ m ' ] \dv dl] 

, — 1 — Jhl 

k=—m l=—n K ’ 1 

m n „ 

~ 0(max(2lv, Aif)) z z, |s(v, q)\dvdq 

k=—m I=—n 4/ 

= 0(max(Av, Aq)) / \s(v, q)\dv dq. 
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so that \c lnn (s, t) — c(s,f)l —*■ 0 as m,n -* oo since Av = v/m, Aq — q/n, 
and ,v(y. 77 ) is assumed to be continuous and absolutely integrable. The convergence 
|c(j, t) 1 ((.v, t) e J(r , y5)) — ( 7 p q ( 5 , 01 — *■ 0 as p, q -» 00 for (s, t) e J(x, p) holds 
by Theorem 3.50, so that the fourth term can be made as small as desired. The terms 
|c(s, 0 — c(s, 01 and |c(s, 0 ~ c(s, r)l((s, t ) e J(f, p))\ can be made arbitrarily 
small by increasing the truncation levels in the frequency and the time domains, that 
is, the sizes of the rectangles 7(v, q) and J (f , p) (Theorem 3.51). A 

Theorem 3.53 If v/m — tz/t, q/n = it / p, m = p. n — q. and the conditions in 
Theorem 3.52 hold, the discrepancy between the coefficients fk,l and Sk.i of cr m n (s , t) 
in (3.90) and c mn (s , t) in (3.93) can be made arbitrarily small for sufficiently large 
truncation levels in both frequency and time domains as m, n —*■ 00 and p, q — > 00 . 

Proof If v/m = jr/r, q/n = n/p , m = p, and n = q, then a mn (s, t) and 
Cm,n(.Si 0 have the same frequencies, so that <r m _„(.$, t ) c,,, fj ( S,t) = Ylk=-m Z"=-„ 
(&,/ — Sk,i)e , ^’ kS ~ rllt \ Since | (T m , n (s, t)—c m , n (s, 01 can be made arbitrarily smallfor 
sufficiently large truncation levels as m,n — »■ 00 and p, q — > 00 (Theorem 3.52), we 
can approximate {(j. /} by {,?£,/} so that the coefficients of a m n (s, t ) can be obtained 
from ordinates of the generalized spectral density s(y, rf) of X(t). 

Note that the Fourier transform, 


Sm,n(y, q) = 




1 

(2?r) 2 


e i(ns-mt) e -i(vs-v) dsdt 


m n 

— X! X! S w s ( v - v k)S(n - m), 

k=—m l=—n 


(3.94) 


of c m , n (s, 0 in (3.93) is a generalized spectral density with power at frequencies 
(vyi, 77 /). The expression of ? m ,„(v, q) results from the definition of the generalized 
spectral density in (3.85), the approximation c m n (s, t ) of c(s, t), and properties of the 
^-function. A 

Following is a Monte Carlo algorithm for generating samples of a real-valued 
non- stationary Gaussian process X{t) with mean 0, covariance function c(s. t ) = 
£[X(s)X(f)], and generalized spectral density s(v, q) in a bounded time interval 
[ 0 , r] based on the sequence of processes 

X { "\t) = y, Cke‘ nt = + y ( A k cos (vfcQ + B k sin fa?) j , n = 1,2,..., 

k=—n k=\ ' ' 

(3.95) 

where v* = kAv, Av = v/n, { Cj j are complex-valued Gaussian variables with 
mean 0 and covariances E[CkCf ] = Sk,i, k,l = —n, ...,«, and Ao = 2Co, Ak = 
Ck + C-k , and If = ifCk — C-k), k = 1 are real-valued Gaussian variables. 
Note that X (n> (t) are real-valued Gaussian processes, E[XW(f)] = £[X(r)] = 0 
at all times t, £'[Z < " ) (s)Z (,!) (r)*] = X” /=-« s k,l exp(i (v^.v — v/0) converges to 
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c(s, t ) as the truncation level v and the resolution n increase indefinitely (Theorems 
3.50-3.53). 

The generation of samples of X (n) {t) in [0, r] involves the following three 
steps. First, if ,y(v, //) has unbounded support, select a bounded frequency range 
7(v, v), 0 < v < oo, such that J I ^^s(v,r])e l(y ~ rl ^ t dvdri ~ f R2 s(v, r})e l(y ~ r, " >t 
dvdij = c(t,t). Otherwise, select 7(v, v) such that it includes the support of 
s(v, rf). Construct a partition {h.i} of 7(v, v) as defined by (3.91) with m = n, 
and impose the condition 2n/v\ = 2 tc/Av — 2 tcti/v > r to assure that the sam- 
ples of X in> (t) are not periodic in [0, r]. Second, calculate the covariances of the 
real-valued dependent Gaussian random variables {Ao, A*, Bk . k — 1 ,...,«} from 
those of {Cj t, k = —n , . . ., n}, and generate samples of {Ao, A*, Bk, k = 1 
and corresponding samples of {C*, k = —n , . . ., w}. Third, calculate samples of 
I w (t), f e [0, r], from (3.95). 

The algorithm is conceptually analogous to algorithms for generating samples 
of stationary Gaussian processes based on the spectral representation theorem 
(Sect. 3.8.1). Both algorithms require to approximate target spectral densities by 
discrete spectral densities with power at a finite number of frequencies and represent 
Gaussian processes by finite sums of harmonics with Gaussian coefficients. The only 
difference between these representations is that their coefficients are independent for 
stationary processes and dependent for nonstationary processes. 

Example 3.60 Let X(f) = U ( h(t )) be a real-valued process, where /z : R -* 1R is 
a monotonically increasing, continuous function such that /z( 0) = 0 and 7/(7) is a 
stationary Gaussian process with mean 0 and covariance function E[U ( s)U it)] = 
c st (s — f) = (1 + A.|s — 1 1) exp(— A.|s — f|), X > 0. The process X(t) is Gaussian 
since the vectors (Z(L ),..., X ( t r )) and (U(h(t \)), . . ., U(h(t r ))) are equal in distri- 
bution for any integer r > 1 and times t\ <■■ ■ < t r , and (U(h(t\)), . . . ,U(h(t r ))) 
is a Gaussian vector. The mean and covariance functions of X(t) are /:’[ A" (f )] = 
E[U(h(t))] = 0 and c(s, t) — £'[ZG)X(t)] = E[U(h(s))U (h(t))] = c st (h(s ) — 
hit)), so that X(t) is a nonstationary Gaussian process. 

Numerical results in Figs. 3.15-3.17 are for hit) = f 2 sign(r) and X = 1. 
Figure 3. 15 shows the real part of the approximate spectral densities S(v, rf) in (3.87) 
for f = p = 1 (left panel), and f = p = 10 (right panel); the accuracy of the cal- 
culated generalized spectral density depends strongly on the size of the integration 
domain J(f, p) = [— f , f] x [—p, p]. Integration domains /( f , p) with r = p > 10 
yield insignificant changes in the real part of J(v, ?/). For all values of f = p, the 
magnitude of the ordinates of the imaginary part of ,v(v. rj) are of order 10 16 . The 
right and left panels in Figs. 3.16 and 3.17 show the covariance function 

c(s, t) = (1 + l|i ,2 sign(s) — rsign(f)|) exp(— L|i ,2 sign(i) — f 2 sign(f)|) (3.96) 

of X(t) and approximations of this covariance function. The left panel in Fig. 3.16 is 
c«,«G, t) in (3.93) with m — n = 400 and v = rj = 30. The left panel in Fig. 3. 17 
is an estimates of c(s,t) obtained from n s = 1000 independent samples of X in) (t) in 
(3.95) with n — 400 generated in [0, r] with r = 10. The approximate covariance 
function and the estimate of this function are satisfactory. Further improvements can 


124 


3 Random Functions 




be obtained by increasing the resolution of the model X in Ht) of X(t), which can be 
achieved by using larger f = ij and/or in = n. O 

Additional information on the spectral-based Monte Carlo algorithm for gener- 
ating samples of non-stationary Gaussian processes discussed in this section can be 
found in [32]. In this reference, it also shown that the algorithm is sufficiently stable 
to be applied even to stationary processes, a severe test since the generalized spectral 
density for a stationary process X(t) with spectral density ,v s i ( v) has the functional 
form s(v, > 7 ) = s st (v)5(v — r/), (v, ?;) e M 2 . 


3.9 Exercises 

Exercise 3.1 Show that the correlation function of an C d -valued random func- 
tion X defined on has the properties rjj(s,t) = rjj(t,s)*, \rjj(s,t)\ 2 < 
rjj(s, s)rj /(t , f), and 2^' /=1 r, j(tk, U)zkZ* is real and positive for arbitrary n > 
1, tk e K' 3 ' , and zk e C. 
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Exercise 3.2 Show that the process X(t) in Example 3.14 is Gaussian if { A* } and 
{Bk} are independent Gaussian variables. Find the finite dimensional distributions 
of X(t) under the assumptions that {A*} and { B/, } are independent and dependent 
Gaussian variables. 

Exercise 3.3 Suppose the argument t in Example 3. 1 3 is a spatial coordinate t eM. cl , 
so that X is a real-valued random field. Show that X is weakly homogeneous with 
mean E[X(t)] = 0 and covariance function c(s, t) = X/t=i a k cos ( v * ' 0 s — 0). 
where v* e and <t& > 0 are constants. 

Exercise 3.4 Show that the correlation function r( r) = E[X(t + r)X it)*} of a 
complex-valued weakly stationary process X satisfies the condition r(r) = r(—r)*. 

Exercise 3.5 Let X be a real-valued weakly homogeneous random field defined on 
M 0 ' . Show the spectral distribution S(v) of this field given by (3.17) satisfies the 
condition J Kd / sin(f ■ v) dS(v) = 0. 

Exercise 3.6 Construct examples of m.s. continuous and m.s. differentiable 
stochastic processes and random fields. 

Exercise 3.7 Complete the proof of Theorem 3.13. 

Exercise 3.8 Show that Y in Example 3.29 satisfies the differential equation 
Y (t) = g{t) Y (t), t > 0, with initial condition Y (0) = 0, if h is differentiable 
and g(t ) = dh(t)/dt. 

Exercise 3.9 Show that X (t) = f Rd r [ cos(f'v) dU(v) + sin(f'v) dV (v)] is an alter- 
native representation for the random field X in Theorem 3.20, where U and V 
are real-valued random fields satisfying the conditions E[dU(v)] = E[dV (v)\ = 
0, E[dU(v)dV(v')] = 0, and E[dU(v) dUiv')} = E[dV(v)dV(v')] = dS(v) 
8(v-v'). 
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Exercise 3.10 Show that the random variables {Z^-} in the Karhunen-Loeve expan- 
sion of a random function satisfying the conditions of Theorem 3.22 have the 
properties given by (3.32) and Zs[(Z(s) — Z < " ) (s))(Z(f) — Z^(f))*] = r(s, t ) — 

ZLt ' A k<Pk(s)(pk(t)* 0 as n oo, where X {n \t) = ZLl ^\ /2 Xk<Pk(t) . 

Exercise 3.11 Show that a weakly stationary Gaussian function is also stationary 
and that linear transformations of Gaussian functions are Gaussian. 

Exercise 3.12 Calculate the scaled covariance functions in Examples 3.38 and 3.39. 

Exercise 3.13 Let {Z„, n = 0, 1, . . .} be a Markov chain taking values in S = 
{1,2 , . . q}. Prove the relationships in (3.41). 

Exercise 3.14 Prove the Chapman-Kolmogorov equation in Theorem 3.30 by using 
properties of conditional densities and Markov processes. 

Exercise 3.15 Show that the expectation of both discrete and continuous time sub- 
martingales, martingales, and supermartingales are increasing, constant, and decreas- 
ing functions of time. 

Exercise 3.16 Find the characteristic functions for Poisson and compound Poisson 
processes. 

Exercise 3.17 Show that the period between consecutive jumps of a Poisson process 
N with intensity X is an exponential random variable with mean 1 / ). . 

Exercise 3.18 Find the quadratic variation process for B + C, where B is Brownian 
motion, C(t ) = C(t) — XtE[Y\], C(t) = Yk, { Ejt } are iid random variables 

with finite mean, and N(t) denotes a Poisson process with intensity /. > 0. Assume 
that B and C are independent of each other. 

Exercise 3.19 Find the quadratic variation process for the approximate representa- 
tion L a a (l) given by (3.62) for an a-stable process L a (t). 
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Chapter 4 

Stochastic Integrals 


4.1 Introduction 

The states of most physical systems are random functions that can be defined as 
solutions of differential equations driven by uncertain actions. For example, stochas- 
tic processes can be used to characterize vibrations in buildings induced by wind 
and stresses in aircrafts since wind loads and flight conditions fluctuate randomly in 
space and time. 

Let ATf), t > 0, be an R^-valued process representing the state of adynamic system 
in random environment. The process can be defined by a differential equation of the 
type 


dX(t) = a(X(t-))dt + b(X(t-))dY(t), t> 0, (4.1) 

where a(-) and b(- ) are ( d , 1) and ( d , d') matrices and F(r) is an -valued input 
process. The meaning of (4. 1 ) is given by its integral form 

a(X(s— )) ds + f b(X(s—))dY(s), t> 0. (4.2) 

Jo 

The use of the left limit X(t—) = lim u ^,X(u) in (4.2) will be clarified in a subsequent 
section. Conditions for the existence and uniqueness of the solution of (4. 1 ) and (4.2) 
are discussed in Sect.5.5.1. The integral b(X (.v — ) ) dY (.v) in (4.2) has random 
integrand and integrator, and the formal derivative of the integrator is usually a 
white noise process. We will see that this integrals is not, generally, defined in the 
Riemann-Stieltjes sense. 

It is common in applications to assume that the formal derivative of Y(t) is a white 
noise process, for example, a Gaussian white noise if Y(t) is an -valued Brownian 
motion B(t). The assumption of white input is not restrictive since many colored 
noise processes can be defined as outputs of linear filters to white noise. For example, 
let U (t) denote the displacements of a simple oscillator with damping ratio ^ > 0 
and natural frequency vo subjected to a stationary first order Gauss Markov process 
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X(t) = X(0) + 
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V ( t ) with mean 0, variance 1, and correlation function E[V (s) V (f)] = exp(— X |s — 
f|), X > 0, so that V (t) is the solution of dV(t) = —X V(t)dt + y/2XdB(t) with 
initial state V (0) ~ /V(0. 1) assumed to be independent of the Brownian motion 
B(t). The equation of motion for the oscillator, U(t) + 2 £ vo U ( t ) + Vq U{t) — V ( t ), 
augmented with that for the input satisfies the differential equation 



x 2 (t) 


0 

dX(t) = 

-V 0 2 ^(0-2^0 ^2 (0+^3 (0 

dt “b 

0 


-xx 3 (t) 


V2X 


with the notation X(t) — (Xi(t) = U(t),X 2 (t) = U(t),X 3 (t) = V(f)), that is, an 
equation of the type in (4.1). 


4.2 Riemann-Stieltjes Integrals 

Let/and g be real-valued functions defined on a bounded interval [0, (] cl, p = 
{to, h, ■ ■ ■ , t, i}, 0 = to < • • ■ < t n — t, be a partition of [0, f], and t[ e [tk-i, tk ] 
denote intermediate points of p. Let 

n 

Sf,p(g) = X/'(4) [g(tk) - g(tk- 1)] (4.4) 

k= l 

be the Riemann-Stieltjes sum of/ with respect to g for the partition p of [0, f]. The 
function / is Riemann-Stieltjes integrable with respect to g on [0, t] if there is a 
number a such that, for every e > 0, there is a partition p E of [0, f] with the property 
I s f,p(g) — a\ < £ for every p 3 p, and every choice of intermediate points ([1], p. 
141). The number a is called the Riemann-Stieltjes integral of/ with respect to g on 
[0, t] and is denoted by JqJ(s) dg(s ) or dg. It can be shown that J^fdg exists if 
(l) fand g have no discontinuities at the same point s e [0, f] and (2)/and g are of 
bounded p-variation and ^-variation, respectively, for some p > 0 and q > 0 such 
that l/p + l/q > 1 ([5], p. 94). A real- valued function h is of bounded r - variation 
on [0, t] if sup { | h(tk) — h(tk- 1 )|' } < oo, where r > 0 and the supremum is 

taken over all partitions of the interval [0, f]. If r — 1, h is said to be of bounded 
variation on [0, ?]. Real- valued differentiable functions on a bounded interval of the 
real line are of bounded variation. However, continuous functions may not have this 
property. For example, the continuous function h : [0, 1 ] — > R defined by h(x ) = 0 
if x — 0 and h(x) — x cos(n/x) if x ^ 0 is not of bounded variation. 
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The Riemann-Stieltjes integral has the properties 


(ci/i + c 2 f 2 )dg = c i / fidg + C 2 / fi dg, 

Jo Jo 

fd{C\g\ +C 2 g2) = Cl / /dgl+C 2 / fdg 2 , 





and 


(4.5) 


provided /i , f 2 are Riemann-Stieltjes integrable with respect to g on [0, t] and / 
is Riemann-Stieltjes integrable with respect to g, gi , g 2 on [0, t], where ci, c 2 are 
constants and s e (0, t) ([1], Theorems 7.2, 7.3, and 7.4). 

It is natural to attempt a construction of the Riemann-Stieltjes integral for the 
more general case in which the integrand and the integrator are random functions. 
Suppose first that the integrator is a Brownian motion B. The samples s i-*- B(s, co ) 
of B are of bounded q- variation on any finite interval [0, t ] for q > 2 [8]. Hence, 
[)/(/) dB(s, co) exists as a Riemann-Stieltjes integral for almost all sample paths of B 
if/is of bounded variation since 1/p + 1 /q > 1 for p = 1 and q > 2. The definition 
of J'fdB as a Riemann-Stieltjes integral corresponding to the sample paths of B is 
referred to as path-by-path definition. The integrals 


e s dB(s,a >), / cosfsj clB(s, co), and / s k dB(s,co) 


exist as Riemann-Stieltjes integrals for almost all samples of the Brownian motion 
because the functions e s , cosfsj, and s k are of bounded variation. 

Consider now more general integrals, for example, BdB and \ ' () f dB, where 
/is an arbitrary continuous function. The path-by-path Riemann-Stieltjes integral 
Jq B(s, co) dB(s, co) does not exist since the samples of B are of bounded < 7 - variation 
for q> 2 so that the condition 1/p + 1/q > 1 is not satisfied. The Riemann-Stieltjes 
integral Jq/(s) dB(s, co) cannot be defined as a path by path Riemann-Stieltjes inte- 
gral for /continuous since J^f(s) dg(s) does not exist for all continuous functions 
/on [0, t ] unless g is of bounded variation ([6], Theorem 52, p. 40). We need an 
alternative definition for j' 0 BdB, dB, and other more general integrals with 
random integrands and integrators. 


4.3 Stochastic Integrals / BdB and / N dN 


Our objective is to extend the Riemann-Stieltjes integral to stochastic integrals of the 
type fj X dY, where X and Y are real-valued stochastic processes and I is an interval 
of the real line. 

Stochastic integrals are defined by limits of sequences of sums resembling the 
Riemann-Stieltjes sums in (4.4). There are two significant differences between the 
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definition of stochastic integrals and Riemann-Stieltjes integrals. First, the sequence 
of sums defining stochastic integrals are random so that convergence criteria for 
sequences of random variables have to be used to construct stochastic integrals. 
Second, the limit of the sequence of sums defining stochastic integrals may depend 
on the selection of the intermediate points {t' k }, in contrast to Riemann-Stieltjes sums 
whose limits are independent of the particular selection of these points. 

The stochastic integrals J B dB and jl N dN are used to illustrate both the con- 
struction of stochastic integrals and differences between stochastic and Riemann- 
Stieltjes integrals, where B and A' denote the Brownian motion and Poisson processes, 
respectively. Two stochastic integrals are defined for particular selections of the 
intermediate points, the Ito and the Stratonovich integrals. It is shown that (1) 
the Ito integral of the Brownian motion with respect to itself is a martingale 
while the corresponding Stratonovich integral is not and (2) the Ito and the path- 
wise Riemann-Stieltjes definitions of the integral Jj N(s—) dN(s ) coincide, where 
N(s~) = ]im u | s N (u). 

Example 4. 1 Consider the random sequence 

n 

Jb.u(B) = 1) - B(t k - 1 )] , 

k= 1 


where p„ = {to, t\, , t n }, 0 = to < t\ < ■ ■ ■ < t„ = t, is a sequence of partitions 
of [0, t ] with intermediate points t' k = t k -\ such that A(p n ) = maxi <k<n(t k — 
t, t_i) -> 0 as n — > oc . The limit of Jb.h(B) as n—>o o exists in m.s. and in probability. 
It is denoted by B(s ) dB(s) or j ( J B dB, has the expression 

rt \ 

/ B(s)dB(s) = ~{B{t) 2 — t), t > 0, (4.6) 

Jo 2 

and is called the Ito or the stochastic integral of B with respect to B on [0, t]. If, 
in addition, the sequence of partitions of [0, t] is refining, that is, p n C p n + 1 , then 
Jb.h(B) converges a.s. to the limit in (4.6). The Ito integral differs from B(t) 2 /2, that 
is, the integral B dB delivered by the formal use of classical calculus. <> 

Proof The random variable .//;„(/!) can be given in the form 

1 « ^ " 

Jb.u(B) = - X (■ B(t k ) 2 - B(t k - 1) 2 ) - - X ( B ( f k) ~ B(t k - 1)) 2 
Z k= 1 Z k = 1 

1 i " 

= ^B(t) 2 - - ^ (B(t k ) - B(fk- 1 )) 2 , 

Z k= 1 

where the last equality holds since Xt=i ( B(t k ) 2 — B(t k - 1 ) 2 ) is a telescopic series 
whose sum is B(t) 2 . The m.s. limit of Jb.h{B ) is (B(t) 2 — t )/ 2 since 
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/( 


jbAb ) - - my 


- o) ] = e ( - ^ Z + 0 


1 t t 2 

= -Z^[(^) 2 (^) 2 ] - -Z £ [(^) 2 ] + T -»• 0, as n 


k,l 


00 , 


where ZiZ?* = /i(r^ ) — B(t k _\ ) (Exercise 4.1). Hence, the Ito integral J ( J B t//l can be 
defined as the m.s. limit of Jb.h(B), and as the limit in probability of Jn „(B) since 
convergence in probability is implied by m.s. convergence. 

Since lim„_ i , 0 o Xi=t \B(tk) — B(t k _\ )]~ = t holds a.s. for a sequence of refining 
partitions of [0, f] [6] (Theorem 28, p. 18), Jb.h(B) converges a.s. to ( Bit ) 2 — t)/2 
for refining partitions so that the Ito integral J ( | B clB can be defined as the a.s. limit 
of Jb,u(B) for these types of partitions. ▲ 

Example 4.2 Consider the random sequence 

n 

JbAB) = Y,B(t' k )[B(t k ) - Bfe.O], 
k= 1 


where t' k = (1 — 0)t k ~\ + 0 t k , 0 e [0, 1], JbAB) differs from Jb.h(B ) in Example 
4. 1 by the choice of the intermediate points, that is, t' k instead of t k - 1 . The limit 

lim JbAB) = \ B(t) 2 + (9 - 1/2) t. (4.7) 

exists in m.s. and probability. For 6 = 1/2, this limit, denoted by J B(s) o dB(s) or 
Jq B o dB and called the Stratonovich integral of B with respect to B on [0, t], is 

I B(s)odB(s)= [ BodB=-B(t) 2 , (4.8) 

Jo Jo 2 

and coincides with the result obtained by the formal use of classical calculus. O 
Proof An alternative form of Jb.h(B) is 

n n 

JbAB ) = AB k + Z AB 'k AB k 

k= 1 k= 1 

with the notations At k — t k — t k - 1 , 


AB k = B(t k ) - B(t k -i) ~ N( 0, Atk), 

AB' k = B{t' k ) - B(t k - j) ~ N(Q, 6 At k ), and 
AB k = B(t k ) - B(t k ) ~ N( 0, (1-0) At k ) 


for a fixed 0 e [0, 1]. We have seen that the first term of Jb.iAB) converges in m.s. 
to (Bit) 2 — t)/2 as n — > oo and its limit defines the Ito integral in (4.6). It remains 
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to show that the second term in Jb,h(B ), that is, ]TJ! =1 AB ' k AB k , converges in m.s. 
to Ot since X n + Y n X + Y holds if X n X and Y n Y . The first moment 
of the second term of Jb,r(B ) is 


E 




L £=i 


Y,E[(AB' k ) 2 ] + Y^E[AB' k ABl] = X 0 At k — Ot 
k k k 


since AB k = AB' k + Ali" k and E[AB' k AB”] = E[AB' k ]E[AB' k ] = 0. The second 
moment of this term, 

E \ ( X AB 'k AB k) 2 ] = e \ ( 2>^ 2 + x AB 'k AB 'k\] 

lv £=1 7 J LV /t=l k= 1 7 J 

= Y J E ^ AB k'> 2 + X £ [ (zis A )4 ] + H E i AB k AB k AB \ AB i^ 

k^l k k,l 

+ 2 'Y_ l E[{AB’ k j 2 AB\ AB','] = 0 2 X At k A H + 3 0 2 ^(At k ) 2 
k,l kjkl k 

+ 0(1 -9) ^tk) 2 =0 2 Y J A *k A H +2 0 2 Y J ^ At k) 2 + HI ~ 0) X(^) 2 , 

k k,l k k 

converges to (6 t) 2 as n — oo since ; 2\f£ Z\t/ = t 2 and ^ k (At k ) 2 < max/{Z\f/} 
2^, Zltjt, max/{Zlf/} — ► 0 as A(p n ) 0, and 'ff k At k = t. Hence, Xl-=i A B' k AB k 
converges in m.s. to Ot so that the m.s. limit of Jb,h(B ) exists and is 

lim JbAB) = \ ( B(tf -t) + 0t= l - B(t) 2 + (0- 1/2) t. 

This limit coincides with the Ito integral for 0 — 0 but differs from this integral for 
0^0. The Stratonovich integral corresponds to 0 = 1/2. A 

Example 4.3 Denote by 


n 

Jb.,AB)(s ) = X 5 (^-i)[ B <A' A s ) - B (tk-] A .?)] and 
k= 1 
n 

JbAB) (A = Y,B(t' k )[B(t k as)- B(t k - 1 A s)] 

k=\ 


the restrictions of Jb,h(B) and J n„(B) to intervals [0, 5 ], s < t. Then Jb.,i(B)(s) 
is an jFj-square integrable martingale while JbAB) A) is not a martingale, where 
A s — <r(B(u) : 0 < u < s) and t’ k = (t k - 1 + t k )/2. O 

Proof That JB, n (B)(s) and Jb.„(B)(s) have finite second moments follows from 
Examples 4.1 and 4.2. Let k(s) be an index such that s e [*£($)- 1 , t k ( s )]- Since 
JrAB)(s) = Z"=V B(t k ^)[B(tk) - B(t k _ 1 )] + B(t k(s) - X )[B(s) - B(f iW -t)] 
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depends on B{u), u < s, it is A s = cr(B(u), 0 < u < ,s) -adapted. For 0 < s < a < t 
such that ,v < a are in the same interval \t m -\, t m \ of a partition p n of [0, t], we have 
J B .n(B)(cr) = J B ,n (B)(s) + B(t m -i) (B(CJ) - B(s )) so that E[J b ,„(B){<t) | A s ] = 
JbAB)(s) + B(t m - 1 ) E[B(a) — B(s) | A s \ = Jb,h(B)( s ) by properties of Brownian 
motion and conditional expectation. For 0 < s < a < t such that s and a belong to 
distinct intervals of p n of [0, t\, we have 

JbABKct) — J b .„(B)(s ) = B(tk( S )-i)(B(tk(s)) — B(s)) 

n 

+ ^ B(tk-i) (B(tk A a) - B(tk-\ A a)) 

k=k(s)+l 

so that E\j B ,n(B)((T) - J b ,„(B)(s ) I A s ] = 0 since E\B(t k ( S )-i) (B{t k(s) ) - B(s)) \ 
As] = E[B(t k(s) ) ~ B(s) | A s ] = 0 md E[X n k=k(s)+\B(.tk-i) AB ki a \ 

A s ] = ZLfeW+t E{B(fk-i)E[AB k , a | A, k ^] \ A s \ = 0 since E[AB Ka \ 
At k _J = 0, where AB k rJ — B(t k Act) — B(tk-\ act). Hence, J [j n (B)(s) has the mar- 
tingale property E\JbAB)A) \ A s ] — Jb,h(B)(s) for any a > s. That J B ,n(B) is a 
martingale follows from Theorem 4.4 discussed in Sect. 4.4.2. The process J B . n ( B ) (s) 
is not adapted with respect to the filtration generated by B, for example, Jb.h(B)(s) 
with 0 = 1/2 depends on B(u), u > s, for s e [t k - 1 , (t k -i + t k )/2], so that it is not 
a martingale. ▲ 

Example 4.4 Consider the random sequence 

n 

JnAN ) = Xtfftt-l) [N{t k ) - N{t k - 1)], 

k=\ 


where p n = {to, t\, . . . , t„}, 0 = to < t\ < ■ ■ ■ < t„ = t, is a sequence of partitions 
of [0, f] such that A(p n ) — 0 as n — > oo and (V is a Poisson process with intensity 
X > 0. Then Jn,h(N ) converges in m.s. and a.s. as n — > oo. The limit of J^.niN), 
denoted by N(s— ) dN ( s ) or J ( j N- dN and called the Ito integral of N- with 
respect to N on [0, t ] , is 

[ NAs)dN(s)= [ N(s-)dN(s)= ~(N(t) 2 -N(t)), (4.9) 

Jo Jo 2 

where N-(s) — N f.v — ) = lim„j v N(u). The Ito integral j N- dN coincides with the 
path by path Riemann-Stieltjes integral. O 

Proof Let a and ft be real-valued functions such that ft is a step function with jumps 
fk atJty, k — 1, 2, . . . , n. If a and ft are not discontinuous from the right or from the 
left at each x k simultaneously, then the Riemann-Stieltjes integral a(x) dft(x) is 
given by J ^ a(x) dft(x) = ^” =] a(x k ) ftk ([1], Theorem 7.1 1). We have 
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n f N(t,0>) 

/ N(s-,ca)dN(s,a>)= V N(T k -i((o), co)[N(T k (co), to) - N(T k _ x (co), o,)] 
Jo k= 1 

N(t,o>) 

= Y j (k-\) = ~(N(t, co) 2 - N(t, co)), 
k= 1 

where 7*, k — 1,2, ... , denote the jump times of IV and 7o = 0, that is, the path 
by path integral N(s—, co) dN (.v, &>) coincides with the Ito integral in agreement 
with a result in [6] (Theorem 17, p. 54), and Jn,h(N) converges a.s. to JqN- dN as 
n — > oo. 

The m.s. convergence of Jn.h(N) to JqN- dN — ( N(t ) 2 — N(t))/2 follows by 
direct calculations and elementary arguments. We need to show that the expectation 
of the square of 

Jn,„(N) - (N it) 2 - N(t)) = \Y. i ANk ~ ( AN kf] 
k 

converges to zero as n — ► oo, where ANk = N(tk) — N(tk-i), that is, the sum 

Y, E [{ AN k - ( AN k ) 2 ) (AN , - (AN,) 2 )] 
k,l 

= Y, E [ AN k AN, - AN k (AN,) 2 - AN, (AN k ) 2 + (ANk) 2 (AN,) 2 ] 
k,l 

= l 4 ^Atk) 2 (At,) 2 +2k 2 Y,(Atk) 2 + 4 A 3 J^(Atk) 3 

k,l k k 

converges to zero as At k — t k — t k - 1 — > 0. The first four moments of the increments 
of N in the above expression result from the cumulants of AN k that are equal to 
X At k for any order r > 1 and the relationships ,i \ = xi, M 2 = X 2 + xf- M3 = 
X 3 + 3 xi X2 + Xi 3 > and /z 4 = X4 + 3X2 + 4 XiX3 + 6xf X2 + xf between cumulants 
and moments ,i r = E[(ANk) r ] ([3], p. 377). The cumulant of order r of a random 
variable with characteristic function cp is i~ r d r [\n(cp(u))\/du r for u = 0. A 


4.4 Stochastic Integrals with Brownian Motion Integrators 

We define stochastic integrals I(X) = X(s) dB(s) and I(X)(t) = JqX(s) dB(s), 
t e [0, r], representing random variables and stochastic processes, where X is a 
stochastic process satisfying some conditions and B denotes a standard Brownian 
motion. 

Let X (t) and Bit), t e [0, r], be a real-valued stochastic process and a Brownian 
motion defined on a probability space (£2 , & , P). It is assumed that X is .(T, = 
<j(B(s) : 0 < s < f) -adapted and r] x ^-measurable. Set 
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0, r] = : [0, r] x C2 -» K, step process defined by (4.11)} 

J X(t, < ooj 

Jif = dT[0, r] = jx : [0, r] x Q -* X(t,co) 2 dt < oo^j = l| 

(4-10) 

The members of have the form 

n 

X(t, co ) = ^ Aj(co) 1 (f;_ i < t < tj), (4.11) 

i=i 

where 0 = to < t\ < ■ ■ ■ < t n = x defines a partition of [0, r] , A ,• e dF fil > 
and E[A 2 ] < oo. The processes in have left continuous samples with right 
limits. We have Jiff C Jtf 2 C dri? since E[ X(t, co) 2 dt] = Z?[ fj 217 /=i Ai-Aj 
l(h-i < t < ti) < t < tj) dt] = X"=i E[A 2 ) j 0 z l(t,_i < t < U)dt < 

r ]T"_ j E[A 2 ] < ooforallX e J&^ 2 andP( J Q r X(t, co) 2 dt > a) < £[ X(t, co) 2 dt] 
/a — > 0 as a — > oo for all X e M’ 1 by Chebyshev’s inequality. 

We define the Ito integral J X dB for integrands X e , : /A ( f\0, r], extend this defin- 
ition to integrands X e Jtf 2 [0, r], define an integral process I(X)(t), t e [0, r], for 
integrands X e .XX 2 \0, r] , and extend resulting integrals to integrands X e ■/?'[(). r] . 
Our discussion is based on [4] (Chaps. 4, 5, and 6), [6] (Chap. 2), and [7] (Chaps. 6 
and 7). 


J)? 2 = Jf? 2 [0, r] = \X : [0, t] x Q -* R, E 


4.4.1 Integrands in 

The Ito integral for step process integrands can be introduced using elementary 
arguments. We define this integral and show that the mapping X i->- I(X) is an 
isometry. 

Definition 4.1 The stochastic integral for X e Jtfj' is defined by 

n 

IQ 0 = 2 > ~ B(t >-^)- ( 4 -! 2 ) 

i=i 

Note that is a linear space and that l(X) is random variable on (Q . &,P) 
depending linearly on X (Exercise 4.4). 

Theorem 4.1 (Ito’s isometry in ) The equality 


¥QI)\\L 2 (dP) ~ PHI L 2 (dtxdP) 


(4.13) 
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holds for all X e dfff, where || ■ and || ■ Hz, 2 (rfrxt// 3 ) are the norms in Lr{dP) = 

L 2 (T2, ^ , P ) and L 2 (dt x dP ) = Lr([ 0, r] x £2 , r] x & , A x P). 

Proof We have 


11*11 


2 

L 2 {dtxdP) 



X(t, co) 2 dt 


!= 1 



< t < ti ) dt 


n 

= ^>[a?] ( f ; - r -i) 

/=1 


and 

n n 

ll'(*)ll l2 ( dP) = Z E \ A ' A J AB ‘ AB J] = H E \ A i AB <] 

i,j= 1 1=1 

= IX4 4 ! I I ^,-,]}= 2>A?] ft - fi-i) 

i=l i=l i=l 


by the dehnition of I(X ), properties of {A,} and B. and £[A, A ; AB, ABj\ = 0 for 
i f j, where A/i, = B(ti) — 7>(f,_j ). For example, if i > j, £[A/ A ; ABj A/f,] = 
£{£[A ( ' Aj ABi ABj \ & ti _ x ]} = E{A i A j AB j E[AB i \ and eIaB, \ 

= 0 by the martingale property of Brownian motion. ▲ 


Ito’s isometry shows that I : Jiff — > lf(dP) maps -VAf- continuously in L 2 (dP). 
Since I preserves distance, Cauchy sequences in are mapped into Cauchy 
sequences in L 2 (dP). 


4.4.2 Integrands in M' 1 

First, the Ito integral I(X) = f ( J X(t) dB{t) is constructed for X e ■XX >2 by using the 
fact that ,7^j 2 is dense in .W,' 2 ([7], Lemma 6.2). Hence, for any X e M j2 there exits a 
sequence of step processes X n e such that \\X n — X|| L 2 (dtxdP) — > 0 as n — > oo. 
Second, the integral I(X) is extended to a stochastic process I(X)(t), t e [0, r]. 

Theorem 4.2 Let X n e Jfff, n — 1.2,..., he a sequence of step processes con- 
verging to X e JXP 2 . The stochastic integral I(X ) = JJ X(t) dB(t) is the m.s. limit 


IiX) = lim I(X n ), 

n->o o 


(4.14) 


where I(X n ) = JJ X n (t) dB(t). 

Proof Since ||X„ — X\\ L i (dtxdP) 0 as n —> oo, the sequence of step processes 

is Cauchy in L 2 {dt x dP). The Ito isometry implies that the sequence of stochastic 
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integrals {7(X„)} is Cauchy in L?(dP) so that it is convergent with limit in Lr(dP) 
since this space is complete. Consider another sequence [X'f of step processes in 
that converges to X in L 2 (dt x clP). We have ||X„ — X' || i2^t^dP) — ► 0 by the 
triangle inequality, which implies ||/(X„) — / (X,' ) 1 1 z, 2 ( rf/ 5 ) - ► 0 as n -> oo by Ito’s 
isometry. Hence, the definition of /(X) in (4.14) is meaningful since it is independent 
of the particular sequence of step processes {X,,} used for its definition. ▲ 

Theorem 4.3 For X e Jt? 2 arbitrary, we have 


POT II L 2 (dP) ~ Ill'll L 2 (dtxdP) an( t 


(4-15) 


E 



X{u) dB(u) 




X{ii) 2 du | 


(4.16) 


where 0 < s < t < r. 

Proof Let X n e , n = 1, 2, . . . , be a sequence of step processes converging 
to X e M’ 2 . We have ||X„|| < ||X„ - X|| + ||X|| and ||X|| < ||X - X„|| + ||X„|| 
by the triangle inequality implying |||X„|| — ||X||| < ||X — X n \\, where all norms 
are in L 2 (dt x dP). Hence, ||X„|| — > ||X|| as n — > oo since {X,,} converges to X in 
L 2 (dt x dP). Similar arguments give the convergence ||/(X„)|| ||/(X)||, n — * oo, 

where the norms are in L 2 {dP). These properties and the Ito isometry in (4.13) give 
(4.15). 

For (4.16) it is sufficient to prove £[1^ ( f‘ s X(u) £/B(m)) 2 ] = £[1a f s ' X(m) 2 du] 
for all A e , that is, (4. 1 5) with X = 1 a X in place of X. Since X e M ' 2 , we have 
\\nX)\\ L 2 idP) = ||X||^2(^xdP) , that is, (4.16). ▲ 

We now construct the integral 7(X)(f) = Jq X(s) dB(s), t e [0, r], with integrands 
X e Jff 2 [0, r]. The construction must account for the fact that the Ito integral /(X) 
in (4.14) has been defined as a m.s. limit of a sequence of integrals with step process 
integrands, so that it takes arbitrary values on a set of zero probability measure. For 
a fixed t e [0, r], the definition 

I(X)(t) = [ X(s)dB(s)= [ 1 (0 < .v < t) X (.v) dB(s) 

J o Jo 

= [ X(s)dB(s), te[ 0, r], (4.17) 

Jo 

is valid since X(^) = 1(0 < s < t)X(s ) is in J4? 2 [0, r], so that 7(X)(r) is ambiguous 
on a set A t e ■'X with P(A t ) — 0. This definition can be extended to a countable num- 
ber of times t, e [0, r] since the sets A tj on which the integrals /(X)(r,) take arbitrary 
values are measurable and have probability 0, so that the integral at all these times 
is not uniquely defined on U,A fi and P(U,A ?i ) < ^fjP{A ti ) = 0. However, (4.17) 
cannot be extended to all t in [0, r] since [0, r] is uncountable, so that U, e [o, T ]A f 
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may not be an event and, even if it is, its measure may be strictly positive. This 
difficulty is resolved by the following theorem. 

Theorem 4.4 For any Xeif 2 [0, r], there exists a continuous martingale 
M(t), t e [0, r], such that 

p({coG^2:M(t,co)=J 1 (0 < j < t)X(s, oj) dB(s, w)} j = 1 (4.18) 

for each t e [0, r], that is, the process fj 1(0 < s < t)X(s) dB(s) has a modification 
that is a continuous martingale with respect to the Brownian filtration ([7], Theorem 
6 . 2 ). 

The theorem shows that the process I(X)(t ) in (4.17) defined by an Ito’s integral is 
useful provided we work with a modification of I(X)(t) that is a continuous martin- 
gale with respect to the Brownian filtration. 

Example 4.5 The process 


M(t) — ( / X(u)dB(u) 


X(u) 2 du, t > 0, 


(4.19) 


is a martingale. M(t) has finite mean, is .^-adapted as a function of B(s), 0 < s < t, 
and, for t > s, we have 


E[M(t ) | J?] = M(s) + E 


^ J X(u)dB(u)\ — 

(J'xwmu) )(/ X(u) dB(u)^J | & s 

2 


X(u y du 


= M(s) + E 


(J X(u)dB(u)\ 


X(u) 2 du | 


The latter equality holds since X(u) dB(u) e JqX(u) dB(u) is a martingale 
by Theorem 4.4 so that E[(f‘ X(u) dB(uf) | ,^ s ] = 0. Since Zs[( f ' X(u) dBiufj 1 — 
fj X(u) 2 du | = 0 by Theorem 4.3, we have E[M(t) \ fp s \ = M(s). O 

Example 4.6 The special case of M(t ) in (4.19) with X ( u ) = 1 shows that the 
process M(t) = ( dB(u) ) 2 — / ( [ du — B(t) 2 — t is a martingale. We can also see 
that M(t) — B(t) 2 — t is martingale by direct calculations. By its definition, M(t) 
has finite mean and is ^-adapted. For t > s, E[M(t) \ JF S ] = E[B(t) 2 \ — t = 

E[(B(t ) - B(s)) 2 + B(s) 2 + 2 (B(t) - B(s))B(s) | & s ] - t = E[(B(t) - B(s )) 2 ] + 
B(s) 2 + 2 B(s) E[B(t) — B(s)] — t — (t — s) + B(s ) 2 — t — M(s), since B(t) — B(s) 
is independent of & s and Bis) e ,XF S . O 
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4.4.3 Integrands in M' 

The set of integrands ,7^' 2 is too restrictive. For example, the continuous function 
X(t ) = expCB(r) 4 ) of a Brownian motion is not in Jif 2 but is in Jif. Localiza- 
tion sequences are used to extend the Ito integral defined in the previous section to 
integrands in Jif. 

Definition 4.2 An increasing sequence of stopping times T n ,n — 1, 2, . . . , is an 
r] localizing sequence for X e J4?[0, r] if X n (t) = X{t)\(t < T n ) e 
J$f 2 [0, r] for all n and P( { T n = r}) = 1. Recall that a random variable 
T : £2 —*■ [0, oo] is an ^-stopping time if {T < t) € for all t > 0. 

The random times 

T n (co) = inf J t : J X(s, co) 2 ds > nor/ > r J , n=l,2 (4.20) 

define a localizing sequence for A e ^sinceP({&> : JJ X(s, co) 2 ds < oo}) = land 
U~j{« : T n (co) — r} = {a> : X(s, co) 2 ds < oo} . Note also that the processes 
X„(t) = X(?)1(0 < t < T n ) arein^ 2 [0, r] and have the property \\Xn\\ 2 L 2 (dtxdP) < n 
by construction. 

We dehne the stochastic process /q X(s) dB(s), X e , in two steps. First, let 
M n (t), t e [0, r], be the continuous -martingale in Theorem 4.4, which consti- 
tutes a modification of J Q T A,,^) dB(s). Second, define (3 1(0 < s < t)X(s) dB(s) as 
the limit of M n (t) as n —>■ oo, that is, 

P\ [ 1(0 < s < t)X(s) dB(s) = lim M n (t) | = 1 for all t e [0, r], (4.21) 

\J o n ^°° J 

The definition in (4.21) is justified since M m (t) = M n (t), n > m. for almost all 
(» <= \t < T m } ([7], Proposition 7.2). Hence, there is a continuous process M(t), 0 < 
t < r, such that P{M(t) — lim,,^^ M„(r)) = 1 for all t e [0, r] ([7], Proposition 
7.3). If M' n (t) corresponds to another localizing sequence {7}'} for X e Jif, then 
limn^oo M n (t) = lim^^oo M' n (f) a.s. at each t e [0, r] ([7], Proposition 7.4). Also, 
if T is a stopping time and X, Y e Jt? such that X (,v, co) — Y (s, co) for all ,v e [0, T\ 
then /J 1(0 < s < t)X(s) dB(s) = Jq 1(0 < j < t)Y(s) dB(s) for almost all 
co e {t < T] ([7], Proposition 7.5). 

Example 4.7 Let / : K — ► K be a continuous function and consider the Ito integral 
I(X) = X(t) dB(t) with integrand X(t) =f(B(t)) e Then 

PT n 

/ f(B(t)) dB(t) = lim YfiBiti-i)) (B(ti) ~ flfo-O) (4.22) 

Jo 

in probability, where f; = i r/n. i = 0, 1, . . . , n ([7], Theorem 7.1). This extends 
results in Example 4.1 to functions /other than the identity function. O 
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Proof For a > 0, note that T a = inf \t : \B(t ) | > a or t > r } is a localizing sequence 
for f(B(t)), f a (B(t)) = f{B(t)) l(|5(f)| < a) is in J# y2 [0, r], and there exists a 
sequence f a , n (B(t)) in J^ 2 [0, r] such that | \f nM - f a \\ L ^(dtxdP) 0 as n -+ oo. 
That/ fli „(5(r)) = < t < tf) is an approximating sequence 

for f a (B(t)) follows from 


= E 


( fa,n(B(t))-fa(B(t))] 2 dt 
- rz n 

/ Y, 1)) ~fa(B(t)) 2 l(f/-l < t < ti) dt 




_I Lre(f, 


sup if a {B{ti-i)) - f a (B(t))Y 


since the random variables sup rs( ,. ; .j (f a {B{ti- 1 )) — fa(B(t)fj 2 are bounded and 

£[ sup, 6(r . _ (f a (B(tj-i )) — /aCB(f))) 2 ] 0 as « — >■ oo a.s. by the uniform conti- 

nuity of B on [0, r] and dominated convergence. 

The Ito isometry implies /,'/ a (fi(.v)) dB(s) = lim^oo Z"=i/a,«( B (h'-i)) (#(*/)- 
B(t/_ i )) , where the convergence is in l? (dP) . It remains to show that the probability 
of event A„(e) = {| Z"=i/( fi 0i-i)) ( B (ti) - B(fi-t)) - fJf(B(s)) dB(s)\ > e} 
converges to 0 as n — >■ oo. We have P(A„(s)) — P(A n (s) fl {T a < r }) + P(A„(e) fl 
{T a = r}) < P{T a < r) + P(A„(e) n { T a = r}), P(T a < r) -> 0 as a ->■ oo, and 
the probability of A„(e) fl {T a = r} can be bounded by || ^" = |/ a (B(f,-i)) (. B(f,) — 
- f 0 T fa(B(s)) dB(s)ll 2 2fdp) /e 2 , which converges to 0 as n oo. A 

Example 4.8 The output of linear filters to Gaussian white noise is a special type 
of Ito’s integral with deterministic integrand. The solution M(t) of a linear filter 
at a time t > 0 defined by dM(t) = —aM(t)dt + dB(t), a > 0, is M(t ) = 
Jq exp (— a (t — .v) ) dB(s) for M( 0) = 0, that is, an Ito integral with deterministic 
integrand. The process M{t) has mean 0 and variance E[M{t ) 2 ] = J,J exp(— 2 a (t — 
s))ds — (l — exp(— 2 a t)) / {2 a) . O 


4.5 Stochastic Integrals with Martingale Integrators 

We consider stochastic integrals f Q X(t)dM(t) with integrands X satisfying some 
conditions and integrators M that are square integrable martingale with right con- 
tinuous samples that have left limits. An example is the integral f C_ dC, where 
C(f) = Xtli Tk 1 (t > Tk) — a t E[Y\ ], t > 0, is the compensated compound Pois- 
son process, {7^} are the jump times of a homogeneous Poisson process N(t) with 
intensity X > 0, { T*} are iid random variables with finite variance, and C_(f) = 
limj-j-f C(s). 
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Example 4.4 shows that Riemann-Stieltjes integrals can be defined for right con- 
tinuous integrators and left continuous integrands provided their discontinuities are 
not coincident. Since martingales admit right continuous modifications and M in 
Jq X(t) dM(t) is assumed to be a right continuous martingale, it is natural to require 
that the integrand X has left continuous samples. 

Let (1?, ,9 r , P) be a probability space with a filtration ? > 0. Denote by ££ 
and the collection of all jointly measurable, ^-adapted real-valued stochastic 
processes that have left continuous samples with right limits and right continuous 
samples with left limits, respectively. 

Definition 4.3 Let S? and 0 be the smallest a -fields on [0, r] x 1? with respect 
to which all processes X{t), t e [0, r], in Jzf and °/> are jointly measurable. If the 
mapping (?,&>) i-x X(t,co) is ^-measurable, X is said to be predictable. If this 
mapping is ^-measurable, X is said to be optional. 

Note that (1) the rx-ficld ,°X is generated by subsets of [0, r] x 12 of the type 
(s, ?] x A, 0 < s < t < r, A e & s , and {0} x A, A e J^o ([4], Sect. 6.3), 
(2) predictable and optional processes are measurable since the er -fields PX and 
0 are included in :39{(). r] x -9- , (3) predictable processes allow “a peek into the 
future” since their samples are left continuous, and (4) any predictable process is 
also optional, that is, & C 0 . Some of these concepts have been already discussed 
(Definition 3.35). 

Let(f2, JF, P) be a probability space and J^ f , 0 < t < r, a filtration on this space 
with respect to which the integrator M of f 0 X dM is a martingale with right continu- 
ous samples and Xis a predictable process. It is assumed that the filtration satisfies 
the usual hypothesis, that is, J?o contains all sets of P-measure 0 (completeness) and 
= n s>t & s for all t (right continuity). 

We have seen that discrete time submartingales admit the Doob decomposition 
(Theorem 2.18). A similar decomposition, referred to as the Doob-Meyer decompo- 
sition, is available for continuous time submartingales. We state a simpler version 
of this decomposition that is needed for the construction of stochastic integrals with 
martingale integrators. Technical details on the Doob-Meyer decomposition can be 
found in [6] (Sect. 3.2) and [2] (Theorem 5.1). 

Theorem 4.5 (Doob-Meyer decomposition). IfM is a square integrable martingale 
with samples in there exists a unique decomposition 

M(t ) 2 = L(t) + (M) (r), 0 < t < r, (4.23) 

where L(t) e is a martingale and ( M){t ), referred to as the compensator of 
M(t ) 2 , is a predictable process with increasing samples such that (M)( 0) = 0 and 
E[(M)(t)] < oo for all t e [0, r] ([2], Theorem 5.1). 

The compensator (M)(t) defined by the Doob-Meyer decomposition in (4.23) is 
unique. If M is continuous, the processes (M) (?) and [M, M] (?) are indistinguishable 
([2], Sect. 2.6), where [M, M] = [M] denotes the quadratic variation of M that is 
defined in the following section. If M is a Brownian motion B, then [M, M](t) = 
(M)(?) = ?, where the last equality holds since [B](?) = ? (Sect. 3.7.6. 1). 
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Example 4.9 Let M(t ) = N(t ) — Xt, t > 0, where N is a Poisson process with 
intensity X > 0. The compensator of M(t) is (M) (f) = X t. O 

Proof Since M (t) is a square integrable martingale with samples in S>, we need 
to show that Lit) = M(t) 2 — X t is a martingale with respect to the filtration ■'P, 
generated by IV that has samples in ft) . We have E[\L(t) |] < oo and L(t) e & t , t > 0, 
by construction, and 

E[M(t) 2 | &,]=E [(N(t) - N(s ) — X (t — s) + (IV (s) - Xs)) 2 | J? s ] 

= £ [(JV(f) - (.v) -X(t- s)) 2 | J? ] 

+ 2E[(N(t) - N(s) -X{t- .?)) {N(s) - X s) \ J? ] 

+ E - Xs) 2 | = X (t - s) + (N(s) -Xs) 2 = M{s) 2 + Xf - s). 


for s < t. Hence, M 2 (t) — Xt,t > 0, is a martingale with samples in . Since the 
Doob-Meyer decomposition is unique, we have (M)(t) = Xt. ▲ 

Let c5£^ ed [0, r] = be the collection of predictable processes Xf), 0 < 

t < r, with the property E[ fj \X(t)\ 2 d(M)it)] < oo. For the special case M — B, 
we have d(M)it) = dt and - : X(' 2 cli — 3^ 2 . The construction of the Ito integral 
f Q T Xf) dMf) with martingale integrator is similar to that with Brownian motion 
integrator. First, Xf) dM it) is defined for integrands X e c?^“ ed that are step 
processes. Second, the resulting integral is extended to arbitrary X e .^" ed . Third, the 
Ito integral is defined for integrands X in the class J ^ le d of predictable processes with 
the property \X(t)\ 2 d(M)(t) < oo a.s., rather than E[ fj \X(t)\ 2 d(M)it)] < oo. 
Note that J ^ Te d is a larger class of integrands, that is, c^ re d D . 

For a step process X in 4X? 2 reA of the type in (4.1 1), the Ito integral is 


n 

IiX) = (Min) - Miti- 0), (4.24) 

1=1 


that is, (4.12) with M in place of B. The following result resembles Theorem 4.1. 
Theorem 4.6 For IiX) in (4.24) we have 


e[\hx)\ 2 ]=e 


\Xit)\ 2 d(M)it) 


Proof Set AMi = Miti) — M(f,-_i) and note that 


(4.25) 


E[\I(X)\ 2 ] = Y J E 
1=1 
n 


Af (AMi ) 2 
A 2 (AM,) 2 


Z * 


Aj Aj AM, AMj 
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since E[Aj Aj AMj AM/] — E{AiA/ AMjE[AM\ | for i > j and the condi- 

tional expectation E\AM , | ^ J is zero. This observation implies £’[|/(X)| 2 ] = 
Z” =1 £{A 2 £[(M)(f i )-(M>a i _ 1 ) | =£[/„ |X(f)| 2 ^M)(?)]sinceM(r) 2 = 

L(f) + (M)(t) by the Doob-Meyer decomposition in (4.23) and L(t) is an ,'X t - 
martingale so that E[L(t/) — L(r,_i) | ] = 0. A 

Let X e J^ 2 ed and let [X n } be a sequence of step processes in =^ 2 ed such that 
£[ / q t |X(f) — X n (t)\ 2 d (M)(f)] — 0 as n — > oo. Theorem 4.6 shows that {/(X,,)} is 
a Cauchy sequence in L 2 (dP) so that it has a unique limit in this space denoted by 
I(X) = lim^oo KX n ). Since I{X) does not depend on the particular sequence \X fl ], 
it is well defined. The limit I(X) = lim,,-^ I(X„ ) is called the stochastic integral of 
X with respect to martingale M. 

Example 4.10 If M(t ) is a continuous square integrable martingale, then the sto- 
chastic integral with M as integrand and integrator is J,| M/s) dM ( s ) = (M(r) 2 — 
M( 0) 2 - (M)(t))/2 = (M(t) 2 - M(0) 2 - [M](0)/ 2. O 

Proof Set Jm.h (M) = X/=t 1 ) 4M,- as in Example 4. 1 , where Z\M,- = M(t,) — 

M(tj- 1 ) and p n = {to, ?i, . . . , f„}, 0 = ?o<T< ••• < t n = t, is a sequence 
of partitions of [0, t] with intermediate points f = t,_i such that A(p n ) = 
max i <i< n (ti — tj- 1 ) -> 0 as n — > oo. Our objective is to show the convergence 
J M ,n(M)-M( 0) 2 /2 -* (M(t) 2 -M(0) 2 - (M>(t))/2. Since Jm.h/M) = M(t) 1 / 2- 
(z\M,) 2 / 2, it is sufficient to show the convergence X/Li {AM,) 2 — > [M](f) = 
(M) (?) , where the latter equality is valid since M is assumed to have continuous sam- 
ples. This convergence is in probability and follows from Theorem 4.1 1 stated in the 
following section. A 

We conclude this section with a theorem given without proof that is useful for 
calculations, and a brief outline on the construction of the Ito integral for integrands 
X e J^pred- 

Theorem 4.7 If X eJ^ 2 ed , the stochastic process I (X)(t) = J ( J X(s) dM(s). te 
[0, r], is a martingale and E[I(X)(t) 2 ] = Zs[ |X(s)| 2 d (M)(s)]. Moreover , the 
process l(X)(t) has right continuous samples with left limits and the compensator 
of{l(X)(t)) 2 is {I(X))(t) = J‘ \X(s)\ 2 d(M)(s) ([4], Theorems 6.5.8 and 6.5.9). 

Let X e Jff pied and set X„(t) = |X(s)| 2 d(M)(s) < n). Since X n e 

■XP 2 cd , the Ito integral / (X n ) is defined. It can be shown that the sequence {/(X,,)} 
is convergent in probability and its limit I(X) = lim,,-^ I(X n ) is well defined ([4], 
Chap. 6). The limit I(X ) is called the Ito integral of X e J^red with respect to a 
martingale M. The following result is for a special type of stochastic integral I(X). 

Theorem 4.8 IfX e .y^jred . M is a square integrable martingale, and both X and M 
have continuous samples, then J () r X(t ) dM(t) = hm^^j-^o V f— i X(ti-i) (M(t/) — 
M(ti-i)j in probability, where p n = {?o, ft, ... , t„], 0 = ?o < h <••■<?„ = r, is 
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a sequence of partitions of[ 0, r] and A(p„) = max i <,■<„(!, — f,_i) ([4], Theorem 
6.6.5). 


4.6 Stochastic Integrals with Semimartingale Integrators 

Stochastic integrals with Brownian motions and martingales integrators considered 
in previous sections are extended to integrals with semimartingale integrators. 

Definition 4.4 A process Y e is a semimartingale if and only if it admits the 
representation 


Y(t) = 7(0) + M (f) + A(r), (4.26) 

where M( 0) = A(0) = 0, M(t) is a local martingale, and Ait) is a process in Z) of 
finite variation on compacts ([6], Theorem 14, p. 105 and Theorem 22, p. 1 14). 

Recall that M is a local martingale if there exists an increasing sequence T n , n = 
1,2,..., of stopping times such that lim,,^^ T n = +oo a.s. and Mit A T n ) is a 
martingale for each n (Definition 3.34). A process is said to be of finite variation on 
compacts if almost all its samples are of finite variation on each compact of R. 

Example 4.11 The square of a Brownian motion B(t) is a semimartingale since 
it admits the representation Y(t ) = B(t) 2 = 7(0) + A{t) + M(t) with 7(0) = 
0, A(t) = t, and M(r) = 2 B(s) dB(s). O 

Proof The representation of Y(t ) results from (4.6). That Mit) is a martingale follows 
from the construction of Ito’s integral with Brownian integrator. Note also that A (f) = 
t is of finite variation on compacts and continuous, so that A e and that A(0) = 0. 
Also, M( 0) = 0 by its definition. A 

Example 4.12 Compound Poisson processes with integrable jumps are semimartin- 
gales. O 

Proof Let C(f) = Yk, t > 0, be a compound Poisson process, where N(t) 

is a Poisson process with intensity X > 0 and {7*.} denote iid random variables with 
finite mean. The process C(t) is a semimartingale since it admits the representation 
C(f) = M(t) + A(f), where M(t ) = C{t) — XE[Y\\t is a martingale and A(t) = 
X E [Y\\t is adapted and of finite variation. A 

Example 4.13 The approximate representation L a a (t) = a (a) Bit) + C u _ a it) of an 
a-stable process L a (t ) in (3.62) is a semimartingale for a e (1, 2] since B(t) is a 
martingale and C a , a (t) is a semimartingale. O 

We now consider stochastic integrals f X dY , where the integrand X is a process 
in ,'Z J and the integrator 7 is a semimartingale. The construction of these integrals 
is similar to that of Ito’s integrals with Brownian motion and martingale integrators. 
Stochastic integrals f X dY are first defined for integrands X e 2zf that are step 
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processes. This definition is subsequently extended to arbitrary integrands X e Jzf. 
A comprehensive discussion on stochastic integrals with semimartingale integrators 
can be found in [6] (Sect. 2.4). 

Example 4.14 Consider the stochastic integral fg B(s) dC(s), where B denotes a 
Brownian motion, so that B e «£? , and C is a compound Poisson processes with jumps 
{Ti} that have finite mean and occur at the jump times { 1 \ } of a Poisson process N(t), 
so that Cis a semimartingale. Considerations as in Example 4.4 give fg B(s) dC(s ) = 
Hk=l AC(Tk) = B{Tk) Yk so that the jumps of fg B(s ) dC(s) coincide 
with those of B(s) C(s), s e (0, t]. Also, the Ito integral f 0 B(s) dC(s) coincides with 
the path-by-path Riemann-Stieltjes integral. O 

The observations in Example 4.14 are consistent with results in [6] related to Ito 
integrals fg X ( 5 ) dY ( s ) with integrands A e .'£ and semimartingale integrators Y. It 
is shown in [6] that the jumps of fgX(s) dY (s) are indistinguishable from the process 
Z(i) AY(s), s e (0, t] (Theorem 1 3, p. 53) and that fgX(s)dY(s) is indistinguishable 
from its definition as a path-by-path Riemann-Stieltjes integral ([6] Theorem 17, 
p. 54). 

The following theorems show that stochastic integrals preserve an essential prop- 
erty of their integrators Y for integrands X e .£ . 

Theorem 4.9 The process Z\(t) = fg X(s) dY (s) is a semimartingale. Moreover, if 
G G 2zf , then Zj ( t ) = G(s) dZ\(s) = fg G(s) X ( s ) dY ( s ) is also a semimartingale 
defined by a stochastic integral with integrand GX and integrator Y ([6], Theorem 
19, p. 55). The latter property is referred to as associativity. 

An alternative statement of Theorem 4.9 is that the coordinates of the Revalued 
process Z(t) = (Zj(t), Z2(0) defined by 


dZ(t ) = 


X(t) 

G(t)X(t) 


dY(t) 


are semimartingales. Note that Z(t ) can be interpreted as the state of a dynamic 
system subjected to semimartingale noise. 

Theorem 4.10 IfX e «S? and Y is a locally square integrable local martingale, then 
fg X(s) dY(s) is a locally square integrable local martingale ([6], Theorem 20, p. 
56). IfX e and Y is a local martingale, then fiX(s) dY (s) is a local martingale 
([6], Theorem 17, p. 106). 

The statement in this theorem is known as the preservation property for stochastic 
integrals. A property is said to hold locally for a process Y(t) if there exists an 
increasing sequence of stopping times T\ < T 2 < ■ ■ ■ such that 7j, — > 00 a. s. and 
Y(t A T n ) has this property for each n > 1. If, in addition, Y(t A T n ) e L 2 , then Y(t) 
is called locally square integrable local martingale. 
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Example 4.15 The stochastic integral I{X)(t) — fgX(s) dB(s), t > 0, in Sect.4.4.2 
is a local square integrable local martingale since its integrator B is a square integrable 
martingale so that it is also a square integrable local martingale. O 


4.7 Quadratic Variation and Covariation Processes 

Let X and Y be semimartingales with the property X(0— ) = K ( 0 — ) = 0 so that 
their jumps at ? = 0 are AX( 0) = X(0) and AT(0) = T(0), where AX{t) = 
X(t) — X(t—) = X(t) — X-(t) and X(t—) — X-(t) = lim^-j-f X(s). We define 
quadratic variation and covariation processes for semimartingales. These processes 
are essential for both developing and using Ito’s formula. 

Definition 4.5 The quadratic variation process of X, denoted by [X, X] or [X], is 
given by 



(4.27) 


The stochastic integral X(s-) dX(s) is defined since AT e ££ and X e 54 ([6], 
Sects. 2.4 and 2.5). Note also that (4.27) holds at t — 0 since the left and the right sides 
of this equation are [X, X](0) = (Z\7f(0)) 2 = X(0) 2 andX(O) 2 — 2X(0— ) Z\X(0) = 
X(0) 2 . 

Example 4.16 The quadratic variation of Brownian motion is [ B , B] (?) = [B] (?) = ?. 
This follows from the definition of the quadratic variation process and the expression 
of j BdB in Example 4.1. See also Example 3.50. O 

Example 4.17 The quadratic variation of a Poisson process N(t) is [N, N] (?) = N(t) 
by (4.27) and the expression of f N_ dN in Example 4.4. See also Example 3.52. O 

Theorem 4.11 The process [X, X] in (4.27) is adapted with increasing samples in 5$, 
has the properties [X,X](0) = X(0) 2 and A[X, X]{t) = [X,X](?) — [X,X](?— ) = 
(AX(t)) 2 , and can be obtained for each t > 0 and s E [0, ?] from 


n 


X(0) 2 + X (Xftt AS)- X(tk-i A s)) 2 [X, X](i), (4.28) 


*=i 


where p n = {?q, ?i, . . . , ?„}, 0 = ?o < ?i <■■■< t n = t , is a sequence of partitions 
with A(p n ) = maxi< J (.< n (? J ( : — ?jt-i) — > 0 as n — > oo. 

Proof For proof see [6] (Theorem 22, p. 59). A sequence of processes H n , n = 
1,2,..., converges to a process H uniformly on compacts in probability (ucp) 
if, for each ? > 0, sup 0<s<f \H n (s) — H(s)\ — > 0 as n — > oo in probability. 
We only note that the definitions of \X.X] and of its jumps at an arbitrary time ? 


give A[X,X](t) = X(t) 2 - 2 fgX(s-) dX(s) - X(t-) 2 + 2 /„' X(s-)dX(s) = 
AX(t) (X(t) + X(t—)) - 2 X{t~) AX{t) = AX(t) (X(t) - X(t-)) = (AX(t)) 2 . A 
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Theorem 4.12 If X is an 1^, -square integrable martingale, then X 2 — [X, X] is an 
ff t -martingale . 

Proof The process has finite expectation since Vis square integrable and [X,X] e L 1 
([2], Proposition 3.4). It is ^-adapted since Visa martingale and [X, X](t) depends 
onX(s),i < t. For t > s, 

E[X(t ) 2 - X(s) 2 | ft] = E[(X(t) - X(s)) 2 + 2 X(t)X(s) - 2X(s) 2 | ft] 

= E[(X(t) - VO)) 2 | ft] + 2E[X{t)X(s) I ft,] - 2£[VO) 2 I ft,] 

= E[(X(t) — VO)) 2 I ft,]. 


which gives 


£[^(V( fi .)-V(t M )) 2 | ft,] = ^£{£[(V(f fc ) - V( f,-t)) 2 I JVt] I 
k=l k=l 

n n 

= Y J E {E[X(tk) 2 - x (tk-\) 2 I iVj I &*} = X £ [( Z ^> 2 -Wk-i) 2 ) I ft*] 


/c=l 

= E 


X! (v(^) 2 -X{t k -f) 2 ) I ^ 


= E[X(t) 2 — X(s)~ | ft,] 


for any partition /?„ = {fo, fi , . . . , t n ], s = to < t\ < ■ ■ ■ < t n — t, of the interval 
[,v, t], The left side of the above equalities converges in L l to [V, V](f) — [V, V]0) 
as A{p n ) — > 0 ([2], Proposition 3.4) so that E^X(t) 2 — [V, V](t) | ft,] = Z?[VO) 2 — 
[V, V]0) I ft,] = VO) 2 — [V, V]0) a.s. showing that V 2 — [V, V] is a martingale. 
It can also be shown that, if V is a local square integrable martingale, V 2 — [V] is a 
local martingale ([2], Proposition 6.1). A 

Let V (t) be a square integrable martingale. Then X(t) 2 — (X)(t) and X(t) 2 — 
[V, V](f) are martingales, where (V) denotes the compensator in the Doob-Mayer 
decomposition. Since the collection of martingale defines a linear space, then (V) — 
[V, V] is also a martingale. We have seen that (V) and [X,X] are indistinguishable for 
continuous martingale. Generally, (V) and [V, X] differ, as illustrated by the following 
example. 

Example 4.18 Let V(f) = N(t) — t, t > 0, where N be a Poisson process with unit 
intensity. The compensator of V it) 2 is (V) ( t ) = t (Example 4.9), while the quadratic 
variation process of V(f) is [V, X](t) = N(t) (Example 3.52). O 

Example 4.19 Let C(t) = ^ be a compound Poisson process, where N is 

a Poisson process with intensity X > 0 and {Yjf are iid random variables. The 
stochastic integral J C-dC is C(s—)dC(s ) = (C(t) 2 — Xf=i L 2 ) /2. If Yk = 1 
a.s., then Jg C(s-) dC(s) = N(s — ) dN(s) = (N (t) 2 — N (t)} / 2, in agreement 
with Example 4.4. <> 
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Proof Theorem 4.11 implies [C, C] (?) = 7^ since the increments AC(s) = 

C(s) — C(s—) are non-zero and equal to Y k at the jump times of /V it). The definition 
of quadratic variation in (4.27) gives = C(0 2 — 2 /,[ C(s— ) dC(s), which 

yields the stated equality. ▲ 

Example 4.20 Let C(f) = 7fc be a compound Poisson process as in the pre- 
vious example. Then A[C, C]{t) = Y^ {t) l(AN(t) = 1), where AN(t) = N(t) — 
N(t—). O 

Proof If C has a jump at a time t > 0, then AC(t ) = Lv(t) l(AN(t) = 1). The 
expression of A[C, C](t) results from Theorem 4.11. A 

Definition 4.6 The quadratic covariation of X and Y is 

X(s-)dY(s)~ [ Y(s-)dX(s). (4.29) 

Jo 

The definition coincides with (4.27) for X — Y. It is meaningful since X and Y are 
semimartingales so that the stochastic integrals f X- dY and j T_ dX are defined 
([6], Sects. 2.4 and 2.5). That (4.29) holds at t = 0 can be shown by using arguments 
as for (4.27). 

Theorem 4.13 For each t > 0 and partitions p„ = {hj, t\ ..... t n ] ■ 0 = to < t\ < 
■ ■ ■ < t n = t, such that A(p n ) — > 0 as n —>■ oo, we have 

n 

X(0) TO 0) + X ( x 0k As)- X(t k -i A s)) (Y(t k As)- Y{t k _\ A s)) 
k= 1 

■^[X, T](5), s e [0, t], (4.30) 


[X, T](t) = X(t) Y(t) — 


[X, 7](0) = X(0) T(0), and A[X, T](f) = AX(t) AY (t) ([6], Theorem 23, p. 61). 
Theorem 4.14 The equalities 

[ X , T] = X - ([X + T, X + T] - [. X , X]-[Y , T]) and 

[ X(s~) dY{s) = X(t) Y(t) - [ Y(s-) dX(s) - [ X , T](r) (4.31) 

7o+ J o+ 


hold, and are referred to polarization identity and integration by parts, respectively. 

Proof Since the collection of semimartingales defines a linear space, the quadratic 
variation process [X + Y, X + Y] is defined. By properties of the stochastic integral 
and the definitions of the quadratic variation and covariation processes, we have 
[X + Y,X + Y] — [X, X] + [T, T] + 2 [X, Y], which yields the polarization identity. 

The integration by parts formula is a direct consequence of (4.29). If [X , Y] = 0, 
(4.31) coincides with the integration by parts formula of the classical calculus. The 
notation J ( ' )+ means that the integrals are performed in (0, ?]. The integration by 
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parts formula can also be used in [0, f] in which case is written as X(s-) dY (s) = 
X(t)Y(t ) — Jq Y(s-) dX(s ) — [X, Y]{t). That the formula holds at t = 0 follows by 
direct calculations. A 

Example 4.21 The quadratic covariation [11 \ . IE] of two independent Brownian 
motions B\ and IE is zero. O 

Proof The polarization identity gives [Z?i , Bf\{t) = (fs/lB, y/2B](t)—[B\, B{[(t)— 
[IE, lE](t )) /2 = 0, where ~J2 B(t) and If (t) + lE(t) are versions and B denotes a 
Brownian motion. A 

Definition 4.7 The path-by-path continuous part [X, X] c of [X, X] is defined by 
[X,X](t) = [X,X] c (f) +Z(0) 2 + Y ( )) 2 

0 <s<t 

= [X,X] c (f) + Y (^W) 2 - ( 4 -32) 

0<s<f 


If [X , X]‘ : (t) = 0, then X is said to be a quadratic pure jump semimartingale. The 
quadratic variation of a quadratic pure jump semimartingale X is [X,X](f) = 
Xo<j <t(AX(s)) 2 . It can be shown that any semimartingale X has a unique con- 
tinuous local martingale part X c and that [X c , X c ] = [X, X] c ([6], p. 63). 

Theorem 4.15 If X is a quadratic pure jump semimartingale and Y is an arbitrary 
semimartingale, then ([6], Theorem 28, p. 68) 

[X, T](r) = X(0) 7(0) + Y AX W AY &- ( 4 - 33 ) 

0 <s<t 

Example 4.22 The quadratic covariation [ B , C] of a Brownian motion B and the 
compound Poisson process C is zero, so that 


N(t) 

[B + C,B+ C](0 = [B. B](t) + [C, C]{t) = t + Y Y l 

k= 1 

The quadratic variation of B + C varies linearly in time between the jumps of the 
pure jump semimartingale C(t). O 

Proof The polarization identity (4.31) with ( B , C) in place of (X . Y) gives [11, C] = 
(\I1 + C, II + C] — [11. B] — [C, C]) /2. We have already found the expressions for 
the quadratic variation processes [B, B] and [C, C] . The quadratic covariation process 
[B, C] is zero by (4.33) and properties of (11, C). A 

Note that X c = B for the semimartingale X — B + C so that [X,X] c (t) — t, 
[X,X](t) = t + ZfiH 2 ’ and [X,X\{t) - [X,Z] C (Q = ^ = Zo< s <, 

(. AC(s )) 2 , where II and C denote a Brownian motion and a compound Poisson 
process, respectively. The process [X, X](t) — [X, X] c (t) is given by the sum of the 
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squares of all jumps of C in [0, /]. This result holds for any semimartingale X since 
semimartingales can have only jump discontinuities and the number of jumps is at 
most countable. We also note that [X, X](0) = X(0) 2 and [X, X}' (0) = 0 since the 
increment AX(0) = X(0) — X(0— ) of X at t — 0 is X(0). 

Theorem 4.16 The quadratic variation and covariation processes [X, X] in (4.27) 
and [X, Y] in (4.29) are semimartingales. Also , ifX has a.s. continuous sample paths 
of finite variation on compacts, then [X, X](f) = X(0) 2 . 

Proof Since [X, X] is an adapted process with increasing samples in S>, the process 
[X,X] is a semimartingale. The polarization identity and the fact that the collection 
of martingales is a linear space show that the quadratic covariation process \X, Y\ is 
also a semimartingale. 

We also note that the product X (t) Y(t ) of two semimartingales is a semimartingale 
since stochastic integrals and quadratic covariation processes are semimartingales 
and X(t) Y(t) is a linear function of these processes by (4.29). 

For the sequence of partition p n in (4.30) 

n n 

X (X(t k ) - Xfk-i)) 2 < sup \X(t k ) - Xfo_ 1 )| £ | X{t k ) - X(t k -i) | -* 0 
k= 1 k k= 1 


as n —> oo since sup^. | X(t k ) — X(t k _ i)| —> 0 and X*=i I X(t k ) — X(t k _ i)| < oo 
by the continuity and the finite variation of the samples of X, respectively, so that 
[X,X](f) =Z(0) 2 . ▲ 

Example 4.23 If X is a process with continuous samples of finite variation on com- 
pacts, then fo X(s-) dX(s) = f^X(s) dX(s) = (X(t) 2 — X(0) 2 )/2 and [X, Z](?) = 
X(0) 2 . ❖ 

Proof We have [X, X](t) = X(0) 2 by Theorem 4.16 so that X_ dX results from 
(4.27). For example, take X(r) = Acos(vt), where A is a real-valued random variable 
and v > 0 is a real number. Then 

f X(s—)dX(s)= I X(s)dX(s)= f (A cos(vs))rf(A cos(vs)) 

J 0 Jo Jo 

= (A 2 /2)[(cos(u t)) 2 - 1] = X(t) 2 / 2 - A 2 / 2 

so that [X, X](t) = X(t) 2 - 2 (X(r) 2 / 2 - A 2 /2) = A 2 = X(0) 2 . ▲ 


4.8 Exercises 

Exercise 4.1 Complete the calculations in Example 4.1 showing that Jb,h(B) con- 
verges in m.s. to {Bit) 2 — f) /2. 
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Exercise 4.2 Calculate the distribution of the random variable f ! B(t) dB(t), where 
B is a Brownian motion and j B dB denotes a stochastic integral. 

Exercise 4.3 Find the Ito integral / ( j C(s-) dC'(s) and the Stratonovich integral 
Jq C(s — ) o dCis) by using arguments as in Example 4.4, where C(?) is a compound 
Poisson process. 

Exercise 4.4 Show that J4?q[0, r] in (4.10) is a linear space and that the stochastic 
integral I(X ) in (4.12) is linear in X. 

Exercise 4.5 Calculate the mean and covariance functions of the process X(t) = 
Jq g(s ) dB(s) defined as a path by path Riemann-Stieltjes integral, where g is a real- 
valued function of bounded variation and B denotes a Brownian motion. 

Exercise 4.6 Find the Ito integrals C(s— ) dB{s ) and J ( J Bis) dC(s) by following 

the approach in Examples 4. 1 and 4.4. 

Exercise 4.7 Calculate the mean and variance of the random variables defined by 
the Ito integrals f Q l \B(t)\ dB(t), J () ' «J~t exp(B(?)) dB{t), and jJ *Ji sin(B(r)) dB(t), 
where B is a Brownian motion and the integrals are stochastic integrals. 

Exercise 4.8 Show that X(t) — J Q r g(s) dB(s), 0 < t < r, is a Gaussian process with 
mean 0 and covariance function £[Z(s) X(?)] = Af g(u) 2 du , where g : R — y R is 
of bounded variation. 

Exercise 4.9 Use Ito’s isometry to calculate the variance of f {) \B(s)\ 1 / 2 dB(s) and 
[‘ (B(s) + .y) 2 dB(s). 

Hint ltd’s isometry gives £[(/o B(s) I 1 / 2 £/Z?(^)) “] = E[ /q \B(s) \ ds\ so that the vari- 
ance of the first integral can be calculated from C[|/i(.v)|] ds] by Fubini’s theorem. 

Exercise 4.10 Complete the calculations in Example 4.9 showing that the compen- 
sator of N(t) — X t is X t. 

Exercise 4.11 The process M(t) = Bit) 2 — t is a martingale with respect to the 
filtration generated by the Brownian motion B. Show that the compensator of M it) 
in Doob-Mayer’s decomposition is (M) (?) —4 B(s) 2 ds. 

Exercise 4.12 Show that M(t) = exp (A Bit) — a 2 f/2) , a > 0. is a martingale and 
that the compensator of M(t ) 2 is ( M ) (?) = A 2 exp (2 A B(s ) — A 2 s) ds. 

Exercise 4.13 Show that C (?) 2 is a semimartingale, where C (?) denotes a compound 
Poisson process as in Example 4.12. 

Exercise 4.14 Find the quadratic variation of the semimartingale Y in (4.26). 

Exercise 4.15 Let N(t) be a compensated Poisson process that is independent of a 
Brownian motion B. Find the variance of B(s) dN(s ) and exp (B(s)) dN(s). 
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Exercise 4.16 Find and plot the quadratic variation for the approximate represen- 
tation L a a (t) of an a-stable process L a (t) defined by (3.54) for several values of 
a e (1, 2], 
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Chapter 5 

Ito’s Formula and Applications 


5.1 Introduction 

The Ito formula extends the change of variable formula of the classical calculus to 
stochastic integrals of the type examined in the previous chapter, and constitutes an 
essential tool for solving stochastic problems encountered in physics and engineering. 
The classical change of variables formula, 

fh(t) ft 

gW))-g{hm= g\u)du= / g\h(s))dh(s), (5.1) 

Jh( 0) JO 

gives the increment of a deterministic real-valued function 1 1 ->- g(h(t )) in an interval 
[0, t]. The differential form of this formula is 

J t [g(Kt))] = g? (h(ty) h'(t) or d [, g(h(t ))] = g'm)) dh(t), (5.2) 

where g' and h! denote the first derivatives of functions g and h. The Ito formula 
extends the rules of classical calculus to the case in which the deterministic function 
h is replaced with a semimartingale X. 

The change ofvariablesformulain(5.1)givesB(t) 2 /2 = B(s) dB(s) forg(y) = 
y 2 / 2 and h replaced by a Brownian motion process B. The result is in disagreement 
with the Ito integral J ^ B(s) dB(s) = B(t) 2 /2 — t/2 in (4.6) but is consistent with the 
Stratonovich integral J 0 B(s ) o dB(s) = B(t) 2 / 2 given by (4.8). 


5.2 Ito’s Formula for M- Valued Semimartingales 

We establish the change of variable formula for real-valued functions 1 g(X(t)), 
where g € C 2 (R) and X denotes a real-valued semimartingale. The formula, referred 
to as Ito’s formula, is established for continuous semimartingales and is extended 
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subsequently to arbitrary semimartingales. Examples are used to illustrate the appli- 
cation and usefulness of Ito’s formula. 


5.2.1 Continuous Semimartingales 

This section outlines the essential steps of the proof of Ito’s formula. Additional 
technical details can be found in [15] (Theorem 32, p. 71). A heuristic derivation of 
the Ito formula is in [11] (Theorem 6.7.1). 

Theorem 5.1 IfX is a continuous semimartingale and g e C 2 (R), then g(X) is a 
continuous semimartingale and, for all t > 0, the integral and differential forms, 

g(X(t)) - g(Xm = f g'(X(s))dX{s)+ l - f g"{X{s))d[X,X]{s) and 

Jo 1 Jo 

dg(X(t)) = g\X{t)) dX(t) + l -g'\X{t)) d [X, X] {t), (5.3) 

of ltd’s formula hold with probability 1 . 

Proof If (5.3) holds, g{X) is a semimartingale since (1) the processes g\X) and g"(X) 
are adapted as memoryless transformations of the adapted process X, (2) g\X) and 
g" (X) have continuous samples since g e C 2 (R) and X is continuous by assumption, 
(3) the integrals JJJ g\X(s)) dX(s) and Jq g"(X(s )) d [A, X] (,?) are semimartingales 
by a preservation property of the stochastic integral ([15], Theorem 20, p. 56 and 
Theorem 17, p. 106, and Theorem 4. 10 in this book), and (4) sums of semimartingales 
are semimartingales. 

The Taylor formula cannot be applied directly to g(X ) since X may not take 
values in a bounded interval, but it can be used for X stopped at time T a = infjt > 
0 : \X(t)\ > a], 0 < a < oo. Since a is arbitrary, results established for X stopped 
at T a hold for a -* oo, that is, for X. 

The Taylor formula shows that the increment g(x + h) — g(x) of function g in 
[x, x + h \ , h > 0, has the form 


g(x + h) - g(x) = h gfx) + y g"(x) + r(x, h ), he M, (5.4) 

where \r{x,h)\ < h 2 a(\h\), a : [0, oo) — > [0, oo) is increasing, and lim,4o Q'(w) = 
0. Fix t > 0 and consider a sequence of partitions p n = {to,h, ... ,t n }, 0 = f (l < 
h < ■ • • < t n = t, of [0, t ] such that A{p n ) -* 0 as n —> oo. Then 

n 

g(X(t)) - g(X(0)) = Y, {g(X(t k )) - g(X(tk- 1))) = Si + S 2 + S 3 , (5.5) 

k=l 
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where 


n 


Si =Y J s'(X(t k _i)) (X(t k ) -X(t k _i)) 


k= 1 


1 

52 = (X(f k ) - X(t k - 1)) 2 


£=1 

n 


S 3 = ^r(Z(f*_i),A:fe) — JT(*jt_i)) 


correspond to the three terms of the Taylor formula in (5.4) applied to the increments 
g(X(t k )) — g{X(t k - 1 ). The sums 5) and S 2 converge in probability to the Ito integrals 
/q g' (X(s)) dX(s) and (1/2) g"(X(i)) t/ [X, X] ( 5 ) as n -* 00 , respectively ([15], 

Theorem 21, p. 57, and Theorem 30, p. 69, and Sects. 4.4, 4.5, and 4.6 in this book). 
The absolute value of S 3 can be bounded by 


n 


|S 3 | = ^r{X{t k -i),X{t k )-X{t k -i)) 


n 


< max a(\X(t k ) -Xfe_i)|) V(X(f*) -Tfe-i)) 2 

1 <k<n ' 


< max a 1 

1 <k<n 


for each n. Since X has continuous samples, the function m X (s, to) is uniformly 
continuous in [0, t] for almost all to, so that max^ \X(t k ) — X(t k - 1 )| — > 0 a.s. as 
n — > 00 implying max^ a(\X(t k ) — .X^-i)!) — > 0 a.s. We conclude that S 3 — > 0 
in probability as n — > 00 since maxi<£<„, n a(\X(t k ) — J5T (?*_ 1 ) I) ► 0 a.s. and 

Z t ( x (fk) - X(tk- 1)) 2 [X,X] (t) (Theorem 4.11). 

In summary, we have shown that for each t > 0 the sequences Si, S 2 , and 
S 3 converge in probability to f {) g'(X(s)) dX(s), (1/2) g"(X(s)) d [X, X] (s), and 

zero, respectively, as n — > 00 . Hence, we have 



for any e > 0, so that the first equality in (5.3) holds with probability 1. ▲ 

Example 5.1 The Ito formula in (5.3) applied to the mapping B i-> g(B) = B n gives 



Jo 


B(s)"- 1 dB(s) + 


n(n — 1) I" 1 

2 J 0 


f 


B(s) n ~ 2 ds, 


(5.6) 
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where n > 1 is an integer. For the special case n — 2 we have B(t) 2 = 
2 [I B(s) dB(s) + t in agreement with our calculations in Example 4.1. Note that 
Bit)' 1 is a semimartingale and that (5.6) provides a recurrence formula for calculat- 
ing the Ito integrals J ( j B" dB. O 

Proof The Ito formula in (5.3) can be used since the mapping B i->- B" is infinitely 
differentiable and B is a continuous square integrable martingale, so that it is a 
continuous semimartingale. That g{B) — B 2 is a semimartingale also follows from 
the representation B(t) 2 = A(t ) + M(t), where A(t) = t is an adapted continuous 
process with A(0) = 0 and paths of finite variation on compacts and M (t) = Bit) 2 — t 
is a square integrable martingale starting at zero. ▲ 


5.2.2 Arbitrary Semimartingales 


The extension of the Ito formula to arbitrary semimartingales is based on the fact that 
semimartingales have at most countable numbers of jumps in bounded time intervals 
([15], Sect. 1.1) and continuous samples between consecutive jumps. 

Theorem 5.2 If X is a semimartingale and g G C 2 (R), then g(X) is a semimartin- 
gale and, for all t > 0. the integral form, 

g(X(t)) - g(xm = r g\X(s-))dX(s) + l- f g'\X{s-))d[X,X} c {s) 

J 0 + ^ J 0 + 

+ [g(X(s» - g(X(s—)) - gf(X(s-)) 4X(s)], 

0 <s<t 

(5.7) 

of Ito’s formula holds with probability 1. 

The Ito formula can also be given in the form 


g(X(t)) - *(X(0)) = f g\X(s-))dX(s)+l f g"(X{s-))d[X,X]is) 
J 0+ ^ J0+ 


z 


g(X(s)) - g(X(s—)) - gf(X(s-)) AX(s) - -g" (X(s—)) ( AX(s )) 2 


(5.8) 


by using the relationship between [X, X] and [X , X] r (Definition 4.7). The Ito for- 
mulas in (5.7) and (5.8) do not include the jumps of X at 0. To include these jumps, 
it is sufficient to write Jq and Xo<s<r ' n pl ace °f Jq + an d Ho<s<t • 

Proof If (5.7) holds, g(X) is a semimartingale since (1) the processes gfX ) and 
g"(X) are adapted as memoryless transformations of the adapted process X, (2) 
g'iX) and g"(X) are adapted processes that have right continuous samples with 
left limits, since, for example, lim^f g'(X(s)) = ^(lim^fZ^)) = g'(X(t)) and 
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lim jtf g'(Z(.s)) = g'(lim stf Z(.y)) = g'(Z (?-)), (3) the integrals fg g'(X(s)) dX(s) 
and Jq g"(Z(,s)) d [Z, Z] c (s) are semimartingales by a preservation property of the 
stochastic integral ([15], Theorem 20, p. 56, and Theorem 17, p. 106), (4) the sum- 
mation in (5.7) is a semimartingale (Exercise 5.2), and (5) sums of semimartingales 
are semimartingales. 

Let [?,-_ 1 , ?,) denote a time interval between two consecutive jumps of X in a time 
interval (0, ?]. We have 

g(.xm - gw *- id = [g(xm - g (x(ti-))] + [ 8 (x(ti-) - gixiti-t))], 

where the first and the second terms on the right side of this equation correspond 
to the jump of X at tj and the change of X in the continuity interval [?,_ i, ?;). The 
contribution of terms associated with the jumps and the continuity intervals of X in 
(0, ?] are Xo<v</ [,£,W A' (,v)) - g(Z(s-)] and (Theorem 5.1) 

giXUi-) - g(X(ti-i)) ~ ^(Zfc-O) (X( ti -) - X(ti-i)) 

+ ] - g"(X(ti-\)) (Z(?,-) - Z(?,-_ 1)) 2 . (5.9) 

The first term on the right side of (5.9) has the representation g , (Z(t,_i))(Z(t,— ) — 
X(ti- 1)) = g'(X(ti- 1 )) [X(ti) - X(fi- 1 ))] - g'(X(ti- 1 )) [X(ti) - Z(?,-))], so that 
its contribution in (0, t ] becomes 

yV(X(h-i)) \X(tj) - Xfe-i))l -+ [ g(X(s—)) dX(s) in probabilty 
l 70 + 

J^g'Wi-!)) [x( ti ) - x( ti -))] = X g\x(s-)) [z(.v) - z( s -))], 

i 0 <s<t 

as the mesh A(p n ) = maxi <*<„(?£ — j ) of partition p n = [to, h, . . . , ?„}, 0 = 
to < t\ < ■ ■ ■ < t n = t, of (0, t] approaches 0 (Sect. 4.5). The contribution of the 
second term on the right side of (5.9) in (0, t] converges to 

g y" J g"(X(ti-] )) (X(tj-) - X(tj- 1 )) 2 -> ^ f g"(s-)d[X,X] c (s) in probabilty, 

as A[p n ) — > 0 by Theorem 4.1 1 giving the convergence Z(0) 2 + X*=i (X(t/ t) — 
X(tk- 1 )) 2 —*■ \X. X] (?) in probability. These observations yield the Ito formula in 
(5.7). The version of this formula in (5.8) follows from the relationship [Z, X] (?) = 
[X, X\ c (?) + Z(0) 2 + Xo<s<r {AX(s))~ between the processes [Z, Z] and [Z, Z] c 
(Definition 4.7). A 

Example 5.2 The process N 2 is a semimartingale and 


N(t) 2 = 2 


N(s-)dN(s)+N(t), 


(5.10) 
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where IV is a Poisson process with intensity X > 0. <> 

Proof Theorem 5.2 shows that N 2 is a semimartingale. Ito’s formula applied to 
g(N) — N 2 gives 

N(t) 2 -N(0) 2 = f 2N(s-)dN(s) + l I 2d[N,N] c (s) 

J o+ 2 J o + 

+ Y [N(s) 2 -N(s-) 2 -2N(s-)AN(s)]. 

0 <s<t 

Since N is a quadratic pure jump semimartingale, [IV, N] c ( 1 ) = 0 for all t > 0 so 
that J ( j + 2d [N, N] c (.v) = 0. The above summation has the alternative form 

N(t) N(t) 

Y [^(7/) 2 - N(Xi-i) 2 - 2 N(Ti-i)] = Y [*" - (i - D 2 - 2 O' - 1)] = N(t), 

i= 1 1=1 


where {Tf denote the jump times of N, which gives (5.10) since N(0) = 0. ▲ 

Example 5.3 Let N(t) be a Poisson process as in the previous example. The recur- 
rence formula 


f N(s—) n dN (s) = [' N(s-f-'dN (s) + ^- 1 ^- 
J o+ J o+ H + 1 n 

^ f i n+l - (i - l) n+1 - (n + 1) (i - D" i n - (i - 1)" -n(i- l)”- 1 


n + 1 


(5.11) 


holds, where n > 1 is an integer. This formula gives J ( | + N(s-) dN(s ) = ( N(t ) 2 — 
N(tf)/2 for n = 1 in agreement with (5.10). <> 


Proof Ito’s formula applied to the mapping N(t) i->- N(t) n gives 


N(t) n = n [ N(s—) n ~ l dN(s) + Y [lVCs) n - N(s-) n - nN(s-) n ~ l 41V (s)] 
7o + o<s<r 

, NU) 

= / N(s-r- 1 dN(s) + Y[N(T i ) n -N(T i - 1 ) n -nN(T i . l ) n - 1 ], 

J o+ rr 


where (7)} denote the jump times of N and the latter summation can be written as 
X/l? \} n ~ 0 — 1)" — n (i — l)" -1 ]. The recurrence formula in (5.11) results by 
subtracting the above equation from that for n + 1 . A 

Example 5.4 Let C(t) — Y!=\ b e a compound Poisson process, where N (t) is a 
Poisson process with intensity X > 0 and {T/} are iid real-valued random variables. 
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The process 



(5.12) 


is a semimartingale. O 

Proof Ito’s formula in (5.7) applied to g(C(t)) = C(t) 2 gives 

C(t) 2 - C(0) 2 = f 2 C(s-)dC(s)+- [ 2 d[C,C] c (s) 

J o+ 2 J o + 

+ X [CW 2 -C(H 2 -2C(H4C(S)]. (5.13) 

0 <s<t 


Since C is a quadratic pure jump semimartingale, j ( ' )+ d [C, C] c (.s) = 0 so that 

N(t) 

X [c(j) 2 - C{s-) 2 - 2 C(s-) 4 C(j)] = X [ C ( r i) 2 - CWi- 1) 2 - 2 C Vi- 1) y i] 

0<J<t !=1 

At(f) r i i- 1 i-1 , AT(t) 

= Z ( Z ^ ) 2 - ( Z ^ ) 2 - 2 ( Z = Z 5/2 - 

!=1 L t=i *=1 k=l J i=l 


where the latter equality holds since the square bracket has the form (Z + T,) 2 — 
Z 2 — 2ZY, with Z = Yk so thtu it is Y 2 . Hence, the summation in (5.13) is 

a compound Poisson process with jumps {Y 2 } occurring at the jump times 7’, of 
C. These considerations and C(0) = 0 give (5.12). Note that (5.12) provides an 
alternative definition for the stochastic integral f C-dC . ▲ 

Example 5.5 Let C be the compound Poisson process in the previous example and g 
a real-valued function with continuous second order derivative. An alternative form 
of the Ito formula in (5.7) is 

g(C(t)) - g(C(0)) = [ [ [g(C(j-) + y) - g(C(j-))] .JHds, dy), (5.14) 
7o+ Jr 

where ./// ( ds , dy) denotes a random measure giving the number of jumps of C in the 
rectangle (s, s + <A] x (y, y + dy] (Definition 3.38). O 

Proof Ito’s formula in (5.7) applied to g(C(t)) gives 

g(C(t)) - g(C(0)) = f g'(C(s—)) dC(s) + ] - [ g"(C(s—)) d [C, C] c (s) 

J 0+ z 70+ 

+ X [g(C(5)) - g(C(s-» - g'(C(s—)) 4C(s)] = X k(C(j)) - g(C(j-))] 

0 <s<t 0 <s<t 
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since / 0 r + g\C{s-)) dC(s) = Io< s < ; gJ(C(s-)) AC(s) and [C, C] c = 0. We have 

Y [s(CCs)) - g(C(s-))] = Y k( C ( 5 -) + AC ^ - «(C(J-))] 

0 <s<t 0 <s<t 

orl 0<j < ( [g(C(s)) - g(C(s-))] =/o + f R [ g(C(s -) + y)~ *(C(s-))] JK(ds, dy). 


5.3 Ito’s Formula for M rf -Valued Semimartingales 


Ito’s formulas are given for functions g : M. d -» R. of R^-valued semimartingales X. 
It is assumed that g has continuous second order partial derivatives. 

Theorem 5.3 If the coordinates ofX are continuous semimartingales, then g(X) is 
a continuous semimartingale and, for all t > 0, the integral and differential forms, 


rt dg(X(s)) 
3 BXj 


1 d 

dXj(s) + — 'y' 


d 2 g(X(s)) 


. . Jo d*i dxj 

l,J= 1 J 


g(x ( t )) - g (xm = Y 

i= 1 ' 

and 

, , v ,.„ dg(X(t)) jv ^ , i ^ d 2 g(X(t)) i[v ^ 

dg(X(t)) = Y dx _ W) +fY d i X i' X i\ «■ 


dxi 3 Xj 


d[X h Xj ] (s) 


(5.15) 


of ltd’s formula hold with probability 1 . 

Theorem 5.4 If the coordinates ofX are arbitrary semimartingales, then g(X) is a 
semimartingale and, for all t > 0, the integral form, 


g(X(t)) - g(X( 0)) = 



dg(X(s-)) 

dxj 


dXfs) 



d 2 g {X{s-)) 
3 Xj 3 Xj 


d [X h XjY (,v) 


z 


8(X(js)) - g(X(s—) 


^dg(X(s-)) 

Y^— AX ^\ 


(5.16) 


of ltd’s formula holds with probability 1. 

The proof of these theorems follows from arguments similar to those employed 
to obtain Ito’s formula for real-valued semimartingales ([15], Theorem 33, p. 74). 
The integral form of Ito’s formula in (5.15) is a special case of (5.16) since X(s-) = 
X(s), AXfs) = 0, and [Xj, XjY (,v) = [X,, Xj\ (s) for continuous semimartingales. 


5.3 Ito’s Formula for Revalued Semimartingales 
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Example 5.6 Let X(t) — exp [(c — er 2 / 2) t + oB{t )] , t > 0, where B denotes a 
standard Brownian motion and c, a e R are constants. This process is a continuous 
semimartingale satisfying the stochastic differential equation dX{t ) = cX{t)dt + 
aX ft) dB(t) , t > 0, with initial state X (0) = 1 . The process is referred to as geometric 
Brownian motion. O 

Proof The differentialformofItd’sformulain(5. 15) applied to mapping (r, B(t)) m- 
X(t) = git, Bit )) = exp [(c - ct 2 /2) t + ctB(t)\ gives 


dX(t) = 


dg(t,B(t)) 

dt 


dt T" 


dgjt, Bit)) 
dBit ) 


dBit) + 


1 d 2 gjt,Bjt)) 

2 dBit y- 


dt, 


which yields the stated differential equation for X since dg/dt = (c — a 2 / 2)Xf), 
dg/dB — a Xf), and 3 2 g/dB 2 — a 2 Xf). A 

Example 5 . 7 Let g : -* R be a real- valued function with continuous second 

order partial derivatives. Let B be an R^- valued Brownian motion whose coordinates 
If are independent Brownian motions starting at Bf O) = Xj, i = I ..... r/. The 
generator of B is 


£/ [g(»] = lim 
rJ-O 


E x \giBit))] - gix) 1 4- 3 2 g(x) 1 

T z i=i dx i z 


where E x [■] = E [■ | 5(0) = x] , x = ix\, , xf) e M d , and A = Xf=t d 2 /dxf 
denotes the Laplace operator. O 

Proof The Ito formula in (5.15) gives 


gm )) - g (B(0)) = Y, 


3 g(Bjs)) 
dxi 


dBiis ) 




3 s(BU)) 
d xr 


ds 


since d [/?,, Bj\ is) = Sjj ds, where Sy = 1 for ; = j and 8y = 0 for i ^ j (Exam- 
ple 4.21). The integrals in the first summation are & t = o(Bis) : 0 < s < t)- 
martingales starting at zero so that their expectation is zero. The integrals in the 
second summation can be defined as Riemann integrals and can be approximated 
by t d 2 g(B(0(a>) t, co))/dx 2 , dim) e (0, 1), for almost all co’s as t 0 so that their 
limits scaled by t converge to d 2 gix) / dx 2 as t J, 0. A 

Example 5.8 A stochastic process X is a standard Brownian motion if and only if 
it is a continuous local martingale with X(0) = 0 and [X , X] it) = t. This fact is 
referred to as Levy’s characterization theorem ([12], Theorem 8.4.2). O 

Proof If X is a Brownian motion, it has the stated properties. Suppose now that X 
is a continuous local martingale with XiO) = 0 and [X, X] it) = t and set Zit) = 
giXf),t) = exp (iuXit) + u 2 t/2 ), u e R. The Ito formula applied to Zit) = 
g(Xit),t) gives 
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Z(0 = Z( 0) + 


a 




dx 


ds 


1 r> d 2 g (X( s ),s) 

2 L dx 2 


cl [X, X] (s) 


= 1 + iu / Z{s)dX{s)-\ / Z(s)ds / Z{s) d [X, X] (s) 


= \ + iu / Z(s)dX(s). 


Note that Z(t) is a complex-valued local martingale by a preservation property 
of the stochastic integral ([15], Theorem 17, p. 107). The martingale property, 
E[Z(t ) | & s \ = Z(s ), t > s, implies E[e iu(x(t) - x(s) '> \ J? ] = e -» 2 (r-^)/2 for 
each « e Iso that X (t) — X (s) is independent of and is normally distributed 
with mean zero and variance t — s. In summary, X starts at zero, is ^-adapted, has 
continuous samples, and has stationary Gaussian increments with mean zero and 
variance t — s that are independent of the past, so that it is a Brownian motion. ▲ 


5.4 ltd and Stratonovich Integrals 


The Ito and Stratonovich integrals with Brownian motion integrands and integrators 
have been defined in Examples 4.1 and 4.2. They are B(s) dB(s) = ( B(t ) 2 — t ) /2 
and J ( j B(s) o dB(s) — B(t) 2 J2 , and are related by 


B(s) o dB(s ) = / B(s ) dB(s ) + t/2= / B(s) dB(s ) + - [5, B] ( t ) 
,/n . n 2 


= / B(s)dB(s)+-[B,B] c (t), 
.In 4 


(5.18) 


where the latter equality is valid since \ B, B] (t) = \ B, B] c ( t ). In Example 4.4, we 
have noted that the Ito, the Stratonovich, and the path-by-path Riemann-Stieltjes 
definitions of J /V_ dN coincide. Note also that 


N(s~) o dN(s) = / N(s~) dN(s) = 


N(s-) dN(s) + - [N, N] c (0 (5.19) 


holds since [N, N] c ( t ) = 0. 

Definition 5.1 Let X and The semimartingales. The Stratonovich integral of X with 
respect to Y is 


ZG-) o dY(s) = 


X{s-)dY(s)+ - [X, Y]‘ (t). 


(5.20) 


5.4 Ito and Stratonovich Integrals 
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Theorem 5.5 If at least one of the semimartingales X and Y is continuous, then 


X(t)Y(t) -X(0)7(0) = / X(s-)odY(s)+ / Y(s-)odX(s). (5.21) 


Proof The integration by parts formula for semimartingales in (4.3 1 ) is 


X(s~) dY(s) = X(t) Y(t) - / T(s-) dX(s) - [X, Y] (f). 


(5.22) 


If X and/or Y is continuous, then [X, Y] ( t ) = [ X , Y} c ( t ) + X(0) y(0) and (5.22) 
becomes Jq + X(s—) dY(s) — X(t) Y (t) — / ( J + Y(s— )dX(s) — [X, Y] c (f)— X(0) F(0). 
The formula in (5.21) results by adding [X, Y] c (t)/ 2 to both sides of the latter 
equation and using Definition 5 . 1 . ▲ 


Theorem 5.6 If X is an W 1 -valued semimartingales and g : M, d — »■ M has continuous 
second order partial derivatives, then g(X ) is a semimartingale and the formula 
d „ 


gm)) - gixm = Y / ^-cx(i-)) o dxfs) 

i~l d0 + dx ‘ 

^ r jL dg(x(s-)) 

+ ^ g(X(s))-g(X( S -))-£ \\ AXi(s) 

0 <s<t L i=l Xi 


(5.23) 


holds ([15], Theorem 21, p. 222). 

The formulas in Theorems 5.4 and 5.6 give the relationship 



y(X(s-))odX l (s) = Y / 

3 xt “ J o+ 


3 g(X(s-)) 
3 xi 


dXfs) 



d 2 g(X(s-)) 
3 xi 3 Xj 


d[Xi,Xj] c (s) 


(5.24) 


between the Stratonovich and Ito integrals. Generally, the state equations for phys- 
ical systems are driven by colored (non-white) noise so that they are interpreted as 
Stratonovich stochastic differential equations. Relationships as in (5.20) and (5.24) 
can be used to transform these equations into Ito stochastic differential equations that 
may be simpler to solve. We will return to this topic in Sect. 5.5.1 . 1 (Theorem 5.10). 


5.5 Applications 


We have already seen some applications of Ito’s formula. In Examples 5.1 and 5.2, the 
formula has been employed to calculate the stochastic integrals J BdB and j N- dN. 
Stratonovich’s integral has been defined and related to Ito’s integral in Sect. 5.4. The 
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differential form of Ito’s formula has been used in Example 5 .6 to construct a stochas- 
tic differential equation for a real-valued process, referred to as geometric Brownian 
motion. The infinitesimal generator of Brownian motion needed to construct local 
solutions for the Laplace equation has been derived in Example 5.7. 

This section provides additional applications of Ito’s formula. They include essen- 
tials on stochastic differential equations driven by Brownian motion and semimartin- 
gales, Tanaka’s formula that is used to solve locally partial differential equations 
with mixed boundary conditions, and Girsanov’s theorem providing a framework for 
developing efficient Monte Carlo algorithms. 


5.5.1 Stochastic Differential Equations 

It was shown in Example 5.6 that X(t) — exp [(c — cr 2 /2) dt + a c//l(r)] satisfies the 
differential equation dX{t) = c X(t) dt+tr X(t) dli(t) with initial state A (0) = l.The 
meaning of this equation is given by its integral form X(t) — X(0) + c fgX(s) ds + 
a fg X(s) dB(s), where JgX(s) ds and f X(s) dB(s ) are Riemann and Ito integrals. 
The differential and integral representations of X(t) are called stochastic differential 
and integral equations, respectively. 

This section considers stochastic differential equations of the type 

dX(t) = a(X(t—), t) dt + b(X(t—), t) dY(t), t > 0, (5.25) 

with integral form 

a(X(s— ), s) ds + I b(X(s—),s)dY(s), t > 0, (5.26) 

Jo 

where a, b are (d, 1), ( d , (/'(-matrices whose entries are real-valued Borel measurable 
functions, Y is an R' 2 -valued semimartingale, X is an R^-valued stochastic process. 
The first and second integrals in (5.26) are Riemann-Stieltjes and stochastic or Ito 
integrals. As in Sect. 3. 7. 6.4, we refer to the formal derivatives of Brownian motion, 
compound Poisson, Levy, and semimartingale processes as Gaussian, Poisson, Levy, 
and semimartingale white noise processes. 


X(t) = X(0) + [ 
Jo 


5.5.1.1 Gaussian White Noise 

Let Y in (5.26) be an -valued Brownian motion process B = {B\ , , B ^ ) , where 
Bj, i = I ,...,(/' , are independent real-valued Brownian motions, so that X satisfies 
the stochastic integral equation 

rt rt 


X(t) = X(0) + 


a(X(s), s) ds + 


b(X(s), s) dB(s ), t > 0. 


(5.27) 


5.5 Applications 
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The process X, the matrix a, and the matrix b or b /;' in (5.27) are called diffusion 
process, drift or drift coefficient, and diffusion or diffusion coefficient, respectively. 
X(s — ) in (5.26) is changed into X(s) since diffusion processes have continuous 
samples, as it will see later in this section. 

The solution X of (5.27) is a Markov process since B has independent increments 
so that, for given X(to) = x, to > 0. future states 


X(t) — x + / a(X(s), s) ds + / b(X(s), s) dB(s), t > to, 


are independent of past states X(u), u < to. The converse is not true. For example, 
the compound Poisson process is Markov but is not a diffusion process. Moreover, 
X has the strong Markov property, that is, if T is an .'X, = a(X(()). B(s) : 0 < s < t)- 
stopping time, the process X(T + r), r > 0, depends only on X(T). 

The solution of (5.27) can be defined in the strong and the weak sense. If the initial 
state X (0) is deterministic, a strong solution is a stochastic process X(t), t > 0, such 
that (1) Zis adapted to the filtration & t — a(B(s), 0 < s < t) generated by B, (2) X 
is a function of the samples of B and of the coefficients a and b, and (3) the Riemann- 
Stieltjes and Ito integrals in (5.27) are well defined at all times ([13], p. 137). If 
X (0) is random, needs to be extended to a (X(()). .bF t ) • We construct approximate 
strong solutions in Monte Carlo studies since the noise version needs to be specified 
to generate samples of B and samples of X are calculated from samples of B by 
integrating (5.27) numerically. 

A weak solution of (5.27) is a pair of adapted processes ( B , X) defined on a filtered 
probability space (T?, 0’ P) such that Bis a version of Band the pair (B, A) 

satisfies (5.27). The weak solution is completely defined by the initial conditions, 
the functions a and b, and the finite dimensional distributions of B. The particular 
version of the input does not have to be specified. Note that a strong solution is a 
weak solution but the converse is not generally true. 

Uniqueness for the solution of (5.27) can be defined in two ways. A strong solution 
is said to be unique in the strong sense if two different solutions of this equation 
have the same samples except on a subset of Q of measure zero, that is, they are 
indistinguishable processes. Two solutions, weak or strong, of (5.27) are unique in 
the weak sense if they have the same finite dimensional distributions, that is, they 
are versions ([3], Sect. 10.4, [13], Sect. 3.2.1, [14], Sect. 5.3). 

Example 5.9 Let X(t) be a real- valued process defined by the stochastic differential 
equation 


dX(t) = sign(A(f)) dB{t), t > 0, 

where X (0) = 0, B is a Brownian motion starting at zero, and sign(jt) = —1,0, and 
1 for x < 0, x = 0, and x > 0, respectively. This equation has a weak solution ([3], 
Sect. 7.3) that is unique in the weak sense ([3], pp. 248-249) but it is not unique in 
the strong sense. O 
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Proof Let Xf, co) be a sample of X(t) corresponding to a sample B(-. co) of B, that is, 
X(t, co) —X(0, co) = Jq sign(X(5, &>)) dB(s, co). Then X = — X satisfies the equation 
X(t, co)—X( 0, co) = fg sign(X(5, «)) dB(s, co). IfX(O) = 0, thenX(-, co) and X(t, co) 
are solutions corresponding to the same sample Bf, co) of B. The processes X and X 
have the same probability law but their samples differ. ▲ 

Example 5.10 Let X be the solution of dX(t) — —a X(t) dt + f dB(t), t > 0, where 
a > 0 and f are some constants, X(0) ~ N(/i(0), y (())). B is a Brownian motion, 
and X(0) is independent of B. This real-valued diffusion process, called the Ornstein- 
Uhlenbeck process, is a special case of (5.27) with d = d' = 1, drift a(x) = —a x, 
and diffusion b(x) 2 = ft 2 . The theorems in the following section guarantee the 
existence and the uniqueness of the solution X. 

Samples of the strong solution of dX{t) = —aX(t) dt + ft clB(t) can be calculated 
from samples B{-, co) of B and the recurrence formula 

rt+At 

X(t+ At,co) =X(t,oo)e- aAt + £ j | e- a(t+A, - s) dB(s,co) 

giving X at the end of [r, t + At] from its value at the beginning of this time interval 
and the input in [f, t + At] . The weak solution of this equation is given by a pair 
of a Brownian motion B and a Gaussian process X with mean /i(t) = E [X(t)] = 
pc( 0)e“" ? , variance y(t) = E [x(t) 2 j = y( 0)e~ 2olt + (l - e~ 2ar ) f 2 /(2a), and 

covariance c(f, s) = E j^X(r) A'G)j = y(s A t) e~ a O 

Proof The formula relating X(t + At) to X(t) shows that the Ornstein-Uhlenbeck 
process is Gaussian, so that the second moment properties of X define its finite 
dimensional distributions. The mean equation can be obtained by averaging the 
defining equation for X. The stochastic integral equation 

X{t)=X{s)—a I X(u)du + f f dB(u), t > s, 

J S J S 

multiplied by X (s) and averaged gives 


E [X(t) X(s)] = £[XG) 2 ] — a E 


X (u) X (s) du 


+ /3E 


X(s) dB{u) 


or r(t, s) = r(s, s) — a J' r(u, s) du by using the Fubini theorem, the independence 
of X(s) from future increments of the Brownian motion, and E \ dB(u)] = 0, where 
r(t, s) = E [X(f) X(s)] . This equation gives 3 r(f, s)/dt = —a r(t, s) by differentia- 
tion with respect to t so that r(t, s) — r(s, s) e~ a ( - r ~ s \ a 

We state without proof two theorems giving conditions for the existence and 
uniqueness of solutions of stochastic differential equations. The statements of these 
theorems involve concepts that are clarified by the following definitions. 


5.5 Applications 
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Definition 5.2 Let a(x) and b(x) be id. 1) and id. d') matrices whose entries are 
functions of x e These matrices satisfy the uniform Lipschitz conditions if there 
exists a constant c > 0 such that 

||a(xi) — a(x 2 )\\ < c ||*i — * 2 II and \\b(x\) - b{x 2 )\\ m < c\\x\ — X 2 W, (5.28) 

where ||£|| = (Z?=i is the Euclidean norm, \\b\\ m = (Zf=i Z/=t ^)’ /2 

denotes a matrix norm, and * 1 , *2 € 

Definition 5.3 The matrices a and b are said to be locally Lipschitz if, for each 
a > 0, there is a constant c a > 0 such that 

\\a{x\) - a(x 2 )\\ < c a ||*i - x 2 \\ and \\b(x\) - b(x 2 ) \\ m < c a ||*i - x 2 \\ (5.29) 

for*i, x 2 e satisfying the condition ||*i||, || jt 2 1| < «• 

For example, the function/!*) = a /* 2 + 5, x e R , is uniform Lipschitz since 
it is differentiable, the absolute value of its derivative//*) = */V* 2 + 5 is smaller 
than 1, and there is at least a point £ in any interval (*i,* 2 ) such that /'(§) = 
(f{x 2 ) — /(*i ))/(*2 — * 1 ) by the mean value theorem ([2], Theorem 20.3). Also, 
/(*) = |*|, x e K, is uniform Lipschitz since ||*i| — \x 2 \\ < |*i —* 2 ! by the reverse 
triangle inequality. The function/!*) = * 2 , * e R, is not uniform Lipschitz since 
its rate of change increases indefinitely as |*| — > 00 , but it is locally Lipschitz. The 
function/!*) = * 2,/2 sin(l/*)l(* / 0) , * e [0, 1], is an example of a differentiable 
function on a compact that is not locally Lipschitz since its derivative is not bounded. 

Definition 5.4 The matrices a and b satisfy the growth condition if there exits a 
constant k > 0 such that 

* • a{x) < &(1+ || * || 2 ) and || b(x) || 2 < k(l+ || * || 2 ), (5.30) 

where * ■ a(x) — X/Lt x > a i(x). The condition * • a(x ) < k (1+ || * || 2 ) is weaker 
than the typically stated growth condition, || a(x) || 2 < k! (1+ || * || 2 ), because (* — 
«(*)) • (* — a(x)) > 0 so that x-a(x) < (||*|| 2 + ||fl(*)|| 2 )/2 < (k' + 1) (1 + 1|*|| 2 )/2. 

The existence and uniqueness theorems for the solution of (5.27) are stated for the 
case in which the drift and diffusion coefficients do not depend explicitly on time. 
This is not restrictive since, if the drift and/or diffusion coefficients of an equation 
depend on time, we can apply these theorems to the state vector X(t) = (X (1 >(f) = 
X(f),X (2) (0 = t) e K £/+1 with X m = X defined by (5.27) and dX (2 \t) = dt 
with the initial condition X i2> (0) = 0. Theorems dealing directly with drift and 
diffusion coefficients that depend explicitly on time are available ([10], Sect. 4.5, 
[11], Theorem 7.1.1, p- 195, [14], Theorem 5.2.1, p. 66, [16], Sect. 4.2). 

Theorem 5.7 If the drift and diffusion coefficients in (5.27) defined on a time interval 
[0, r] do not depend on time explicitly, are bounded functions and satisfy the uni- 
form Lipschitz conditions in (5.28), B is a Brownian motion martingale on a filtered 
probability space (f2, , (^r)r> o> P), B( 0) = 0, andX{ 0) is d^o-measurable, then 
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(1) there exists a strong solution X for (5.27) that is unique in the strong sense, (2) 
X is a 3§([0, oo)) x measurable , -adapted, and continuous process such that 
sup 0<f<T £’[Z(?) 2 ] < oo, and (3) the law of X is uniquely determined by the drift 
and diffusion coefficients and the laws ofB and A'(O) ([3], Theorem 10.5). 

Theorem 5.8 The statements in Theorem 5 . 7 also hold if the assumptions that the 
drift and diffusion coefficients in (5.27) are bounded, uniform Lipschitz functions 
are replaced with the assumptions that the drift and diffusion coefficients are locally 
Lipschitz functions satisfying the growth conditions in (5.30) ([3], Theorem 10.6, 
[18], Theorem 9.1, [14], Theorem 5.2.1). 

Example 5.11 The stochastic differential equation 

dX(t) = cX(t)dt + aX(t)dB(t), t e [0, r] , (5.31) 

has the unique strong solution 

X(t) = X(0) exp[(c - ct 2 /2) t + a B{t)], (5.32) 

called the geometric Brownian motion process (Example 5.6). <> 

Proof That the above stochastic differential equation has a unique strong solution 
follows from Theorem 5.8 since a[x) = cx and b(x) = o x are locally Lipschitz 
and satisfy the growth conditions. 

Theorem 5.7 cannot be applied directly since a(x) and b(x) are not bounded, but 
the original problem can be modified to satisfy the requirements of this theorem. 
Let X be the solution of a stochastic differential equation dX it) = a(X(t))dt + 
b(X(t)) dB(t), where the coefficients a, b satisfy the uniform Lipschitz conditions but 
may not be bounded. For § > 0 define the function [x]j = — 1 (x < — £)§ + l(— § < 
x <()i+ I (x > f) f and consider the stochastic differential equation 

dX(t) = a([X(r)] f ) dt + b([Xit )\ ) dBit). 

Since the functions a([-]^) and b(\ -]f) satisfy the conditions of Theorem 5.7, X exists 
and is unique in the strong sense for each £ > 0. Since f > 0 is arbitrary, (5.31) has 
a unique solution in the strong sense. ▲ 

Theorem 5.9 If the conditions of Theorem 5.7 are satisfied, the solution X of (5.27) 
is a semimartingale with the representation 

X(t)=X(0)+A(t)+M(f), (5.33) 

where Aft) = a(X(s )) ds is an adapted process with samples of finite variation on 
compacts, M(t) = [] b(X(s )) clB{s) is an -square integrable martingale, A(0) = 
0, and M( 0) = 0 ([3], Theorem 10.5, p. 228). 

Example 5.12 If the conditions of Theorem 5.7 are satisfied and d = d' = 1 in 
(5.27), then this equation has a unique strong solution and 
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g(X(t)) - g(xm = f g'(X(s)) dX(s) + ]- f g"(X(s)) b(X(s)) 2 ds 
Jo Jo 

for any function g e C 2 (R) . O 

Proof Since X is a continuous semimartingale, Ito’s formula in (5.3) can be applied 
to g{X). This formula involves the quadratic variation process [X, X] , where X(t) = 
Z+A(t)+M(t ), Z = X(0), A(t) = fo< 2 (X(s))ds, and M(t) = b(X(s)) dB(s). 

We have [X, X] = [Z, Z] + [Z, A] + [Z, M] + [A, Z] + [A, A] + [A, M] + [M, Z] + 
[M, A] + [M, M ] by the linearity of quadratic covariation. All quadratic covariations 
with arguments Z and A or M are zero since Z is a constant process and A(0) = 
M( 0) = 0. The quadratic variation of A and the quadratic covariation of A and M are 
also zero so that 

[X, X] = Z 2 + [M, M] = Z 2 + [ b(X(s)) 2 d [B, B] (s) =Z 2 + [ b(X(s )) 2 ds 
Jo Jo 

implying d [X, X] ( t ) — b(X(t)) 2 dt since Z 2 is a constant process. A 

Example 5.13 Let X be the diffusion process in Example 5.12 and g : IB. -> R 
be an increasing function with continuous second order derivative. The memoryless 
transformation Y ( t ) = g(X(t)) of X is a diffusion process defined by 

dY(t) = ay(Y{t)) dt + by(Y(t)) dB(t) where 

ay 00 = g(x)a(x) + - g"(x)b(x) 2 , b Y (y) = g'(x)b(x), and x=g _1 (v). 


For example, let X be a real-valued diffusion process with drift a(x) = —x and 
diffusion b(x) = 1. The stochastic differential equation for Y = g(X) = X 2 is 

dY(t) = 3 (-Y(t) + | TO) 1 1/3 sign(T (t))) dt + 3 |T(t)| 2/3 dB(t) 

since x — g _1 (y) = |y| 1//3 sign (yj . Sample paths of T can be obtained from samples 
of X and the memoryless transformation Y — X 2 or can be generated directly from 
the stochastic differential equation for T. 

Memoryless transformations can be used to generate diffusion processes with 
specified marginal distributions. For example, Y{t) = F~ l (@(X(t))) is a diffusion 
process with marginal distribution F, where 0 denotes the distribution of the standard 
Gaussian variable and X{t) can be a stationary Ornstein-Uhlenbeck process with 
mean 0 and variance 1 . The probability law of Y ( t ) is defined by the properties of 
X(t ) and the mapping X{t) i->- Y (t) = F~ l (<P(X(t))). Note that T ( t ) is a translation 
process of the type discussed in Sect. 3.7.2. O 

Proof The differential form of Ito’s formula in (5.3) gives 


dY(t) = 


g'(X(t)) am)) + \ g"m)) b(X(t)) 2 


dt + g'(X(t )) b(X(t )) dB(t). 
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Since x — g 1 (y) exists, Fisa diffusion process satisfying the stochastic differential 
equation 


dY(t) = 


g'(g- l (Y(t)))a(g- l (Ym+lg"(8~ 1 (Y(tmb(g- l (Ym 2 
+ g'(g-\Ymb(g-\YmdB(t ) 


dt 


with the stated drift and diffusion coefficients. ▲ 

Example 5.14 Let X be defined by dX{t) = —aX(t)dt + fdB(t), where B is a 
Brownian motion and a > 0 and ft are constants. The moments fi{q\ t) = E [X(t) q ] 
of X(t ) satisfy the ordinary differential equation 


B 2 q(q — 1) 

p-(q-, t) = -a q n(q\ t) + P-(q ~ 2; ?), q= 1,2,..., (5.34) 

where fi(q; t) = dqiUp t)/dt and /i(r/; t) — 0 for q < — 1 by convention. Note that 
X(t) has moments of any order since it is a Gaussian process. O 

Proof The drift and diffusion coefficients of X are locally Lipschitz and satisfy the 
growth conditions so that there exists a unique, adapted, and continuous solution if 
X(Q) <= jr 0 . The integral form of Ito’s formula is 

g(X(t)) - g(xm = f g'(X(s)) dX (,s) + f g"(X(s)) ds 
Jo ^ Jo 

for g e C 2 (M) . The expectation of this equation gives 

£[g(X(t))]-£[g(X(0))] = -a f E[g'(X(s))X(s)] ds+^~ f E[g"(X(s))] ds 
J0 Y- Jo 

since g' (X(s)) dB(s) is a martingale starting at zero so that its expectation is 
zero and the solution X is a oo) x ^"-measurable function and P-integrable 
so that £[ Jo g'(X(s)) X(s) ds] = / Q r E [g'(X(s)) X(s)] ds and E[ /J g"(X(s)) ds] = 
Jo E [g"(XG))] ds by Fubini’s theorem. For g(x) = x q , we have 


H(q; t) - p,(q; 0) = —aq / p(q\s)ds + 


^q(q- 1) 


fi(q — 2; s) ds 


which gives (5.34) by differentiation with respect to t. ▲ 

Example 5.15 Let X be the Ornstein-Uhlenbeck process in Example 5.14. The char- 
acteristic function cp(u ; t) = E [exp(i uX{t ))] , u e K, of X{t) satisfies the partial 
differential equation 


3 (p 

37 


3 cp 

— —a a 

3 u 


f 2 u 2 

2 


cp. 


(5.35) 
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The stationary solution q> s (u) — lim^oo <p(u; t) — exp (—/l 2 u 2 /( 4 a)) of this equa- 
tion satisfies the ordinary differential equation a u(p' s (u ) + /l 2 u 2 <p s (u)/2 = 0 with 
solution ip s (u ) = exp( — f 2 r/(4a)), so that X(t) ~ N{ 0, yS 2 / (2 a)) as t — ► oo. O 

Proof Ito’s formula applied to the real and imaginary parts of g(X) = exp(7 uX{t)), 
«eR, gives 


_ e iuX( 0 ) = [‘(iuyuxv dx(s) + ^ f\ iu) 2juxv ds . 

70 2 Jo 

by adding these contributions. The expectation of this equation is 

(p{u\f) — (p{u\G) = —iua f E\e lltX(s ^ X(s)\ds — — f E\e luX( ^\ ds 

J o 2 Jo 

by using arguments as in the previous example. The differential equation for the 
characteristic function results by differentiating the above formula with respect to t 
and the equality d k q>(u\ t)/du k — £[(; exp(f uX(t ))]. 

Since <p s is the solution of the ordinary differential equation (p' s (u)/(p s (u) = 
—ft 2 u/(2a), it is ln(^(n)) = — f 2 u 2 /{4 a) + c or cp s {u) — d exp( — ft 2 ir/(4a)), 
where c and d are constants. The condition <p s (0) = 1 implies d = 1, so that the 
marginal distribution of X(t) approaches that of N{ 0, ft 2 u 2 /( 2 a)) as t — »■ oo. ▲ 

Example 5.16 Let X be defined by the stochastic differential equation dX(t) = 
— a X (t) dt + a X(t) dB(t), t > 0, where a, a are constants and B denotes a Brownian 
motion. Then 


p(q; t) = E [X(r) 9 ] = p(q; 0)exp 


r“ + — 2— ) 


for any integer q > 0. The moment of order q of X approaches zero and ±oo as 
t —> oo if a > a*(q) and a < a*(q), respectively, where a*{q) = (q — l)er 2 /2. If 
a — a*(q), then p.(q; t) — p.(q; 0) is time invariant. <> 

Proof The moment equation is (i(q; t) = —a q fi(q\ t ) + (1/2) q (q — 1) cr 2 /x(</; t), 
and has the stated solution. If —a + (q — 1) a 2 / 2 < 0, or equivalently, a > a*{q), 
then p.(q; t) decreases to zero as t — »■ oo. If a < a*(q ), p.{q\ t ) converges to ±oo 
as t — > oo depending on the sign of p(q: 0). 

We have seen in Examples 5.6 and 5.11 that the solution of dX(t) = —aX (t) dt + 
a X{t) dB(t ) is the process X(t) = X(0) exp[( — (a + a 2 /2) t + o 5(0)1, referred 
to as geometric Brownian motion. If X(0) is independent of B. then £[X(f) 9 ] = 
i?[X(0) 9 ] ^[exp^ (—(a + o 2 12) t + a 5(0))], which coincides with p.(q; t) since 
E [exp(g a 5(0] = exp(g 2 a 2 12). ▲ 

Example 5.17 Let X be the diffusion process in Example 5.12 and /i : R — > R be a 
function with continuous first order derivative. Then, 


h(X(s)) o dX(s) = 


h(X(s))dX(s) + - 


h'(X(s)) b(X(s)) 2 ds, 
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where JjJ h(X(s )) o dX(s) and J () h(X(s)) dX(s) denote the Stratonovich and Ito inte- 
grals, respectively. This relationship follows from (5.20) and Theorem 4.13, which 
give d [h(X), X] c (?) = h’(X) d [. X , X) (; f ) = fc'(X(?)) 6(X(?)) 2 d?. <> 

Example 5.18 Let X be the solution of the Ito integral equation 

a(X(s), s) ds + f b(X(s), s) dB(s), (5.36) 

Jo 

where b(x, ?) has continuous second and first order partial derivatives with respect 
to x and t, respectively. The process also satisfies the Stratonovich integral equation 

X{t) — X(0) + [ a(X(s),s)ds + [ b(X(s), s) o dB(s), (5.37) 
Jo Jo 

where a(x, ?) — a(x, t ) — (1/2 )b(x, t) [3 b(x, t)/dx] . O 
Proof The Ito formula applied to U (t) = b(X(t), t ) gives 

3 U , , dU 1 3 2 U 

dU (?) = — (a(X(t), t ) dt + b(X(t), t) dB(t)) H dt H T d [X, X] (?) 

dx dt 2 dx l 

so that 


X(t) = X(0) + 


d [U, B] (?) = 


3 U 

b(X(t ), ?) dB(t), dB{t) 

dx 


3 U 

= b(X(t), t) — dt. 
dx 


Since [ U , B] = [U . B \ ,: , the relationship between the Ito and Stratonovich integrals 
in (5.20) becomes 


1 


dU 


U(s)odB(s)= / U (s) dB(s) H — / b{X{s),s ) — ds, 
Jo 2 ,/n 3x 


or, equivalently. 


ft i ft 

b(X(s), s)odB(s) = / b(X(s), s) dB(s) + - / b(X(s),s) 


db(X(s),s) 

dx 


ds, 


so that (5.36) becomes 


X(t) = X(0) 


a(X(s), s) — 


1 3 b(X(s),s)' 


dx 


ds+ b(X(s), s) o dB(s), 


by replacing b(X(s), s) dB(s) with its expression in the previous formula. The 
result in (5.37) is referred to as the Stratonovich differential equation. A 

The stochastic differential equations considered in this chapter are driven by white 
noise processes that have infinite variance and can be interpreted as, for example, 
formal derivatives of Brownian motion processes. White noise is a mathematical 
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abstraction that has a constant spectral density over the entire real line, according 
to our previous formal calculations. In contrast, the differential equations describing 
the behavior of physical systems are driven by processes that may have a flat spectral 
density over a broad frequency band but their variance is finite. These processes 
cannot be viewed as the formal derivatives of Brownian motions, and are called 
colored noises to distinguish them from white noises. The significant difference 
between white and colored noises implies that the theory of stochastic differential 
equations cannot be used directly to solve practical problems. The theory can be used 
in applications if the drift of a differential equation with colored noise is modified and 
the colored noise is replaced with white noise. The Wong-Zakai theorem provides 
the mapping from equations with colored noise to stochastic differential equations. 

A two-step algorithm can be used to solve practical problems. First, a differential 
equation driven by colored noise defining the evolution of the state X ( t ) of a physical 
system is developed based on physics, data, and any other available information. 
Second, the resulting equation is mapped into a stochastic differential equation with 
white noise such the two equations have the same solution X(t). It is preferable to 
deal with differential equations driven by white rather than colored noise since they 
can be solved efficiently by using the tools of stochastic calculus. 

The subsequent example shows that, generally, it is not possible to construct sto- 
chastic differential equations with white noise whose solutions are similar to those of 
differential equations with colored noise by using heuristic arguments. Following the 
example, we state the Wong-Zakai theorem, which provides a mapping between Ito 
and Stratonovich equations that share the same solutions, that is, between differential 
equations with white and colored noise inputs. 

Example 5.19 Let Y ( 1 ) be real-valued process satisfying the differential equation 


Y(t) = c Y(t ) + a Y(t) V(t), t> 0, 


where Y (0) = x ^ 0, c and a are constants, and V (t) is a colored Gaussian noise 
with continuous samples. The solution of this equation is 



by classical calculus, where /,{ V (s) ds is a path-by-path Riemann integral. 

Suppose V is defined by dV (t) = —a V(t)dt + ~j2a dB(t), where a > 0 
and V (0) ~ N( 0, 1) is independent of the Brownian motion B. Then V (1) is sta- 
tionary Gaussian process with mean 0, covariance function E [V(t + r) V (f)] = 
exp (— a |r|), spectral density s(v) = a/ \tx ( a 2 + v 2 )] , and a.s. continuous sam- 
ples. Since for large values of a the covariance of V (t) is nearly a 5-function, it may 
be tempting to write 


Y ( t ) = lim T(f) ~ x exp ct + a 


f 


dB(s ) 


= x exp[c t + a B(0] 
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by using the approximation V (s) ds — dB(s). This may suggest that Y (?) is the 
solution of dY(t) = cY(t)dt + a Y(t) dB(t ) with F(0) = x. 

However, we have seen in Example 5.11 that the solution X(t) of dX(t) = 
cX(t) dt + a X{t) dB(t ) with X(0) = x is X{t) = x exp[(c — ct 2 /2) t + a Z?(f)]. For 
Y(t) to match X(t). the drift in the defining equation for Y (t) needs to be changed 
from c Y(t) to (c — cr 2 /2) Y(t). <> 

Theorem 5.10 (Wong-Zakai theorem) LetX(t) andX n (t) be a real-valued process 
defined by the stochastic differential equations 


dX{t) — a(X(t), t) dt + b{X{t), t) dB(t), t e [0, t] , 



(5.38) 


2 ’ dx 


and 


dX n (t) = a(X n (t), t) dt + b(X n (t), t) dB n (t ), t e [0, r] , (5.39) 

where B n (t ) is an approximation of B(t) and initial states X(0) = 2f„(0) are inde- 
pendent of B and B n . If( 1 ) the drift and diffusion coefficients in (5.38) and (5.39) are 
such that the solutions of these equations exist and are unique in the strong sense, 
(2) B n (t ) converges a.s. to B(t) for all t e [0, r] as n —> oo and its samples are 
continuous and of bounded variation on [0, r] , (3) B n is uniformly bounded for 
almost all a>, and (4) the samples of B„ have piecewise continuous derivatives, then 
X n (t) — y X(t ) a.s., n — >■ oo, at all t e [0, r] . If in addition B n (t) -* B(t) uniformly 
in [0, r] , then X n (t) — > X{t) uniformly in [0, r] a.s. as n — > oo [19]. 

Approximations B n for B satisfying the requirement of the Wong-Zakai theorem 
can be constructed simply. For example, B n can interpolate linearly between values 
of B at the consecutive points of a partition p n — {to, ft, . . . , t„], 0 = to < t\ < ■ ■ ■ < 
t„ = r, of [0, r] with mesh A(p„) -» 0 as n —> oo. The process B n has continuous 
samples that approach the samples of B as n — > oo and are of finite variation on 
compacts, so that j ( J b(X n (s), s) dB„(s) can be calculated as a path-by-path Riemann- 
Stieltjes integral. However, B n differs from B in an essential manner. The Brownian 
motion is a martingale while B n is not. For example, B n (s) = B(tj- 1 ) + ABfs — 
ti-i)/A, fori e \tj- 1 , ti\ , where AB , = B{t{) — B(f,_i) and Atj = ti — r,_i. Since 
B n (s) depends on Bit,) is not ^-measurable for s < f; , so that it is not a martingale. 

Note also that there is no drift correction for equations with additive noise, that is, 
equations with state independent diffusion coefficients, and that the solution X n (t) 
of the stochastic integral equation 


/' 


•t 


a(X n (s), s) - b(X n (s), s) 


db(X n (s),s) 

dx 


x„(t) = X„(0) + 


o 



ds 
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Fig. 5.1 Samples of W (t) ( left panel ) and an estimate of E [W (f) 2 ] ( right panel ) 


driven by colored noise converges in the sense of Wong-Zakai’s theorem to the 
solution X(t) of the Ito equation (5.27) as n — > oo. 

Example 5.20 Let X(t), t > 0, be a geometrical Brownian motion, so that it satis- 
fies the Ito equation dX(t ) = cX(t)dt + er X(t) dll(t) with initial state X(0) = x, 
where c, a e R. are constants. The Stratonovich version of this equation is X(t) = 
( cX(t ) — a 2 X{t)/ 2) + <7 X(t) V (t) by the Wong-Zakai theorem, where V (f) is col- 
ored stationary Gaussian noise with mean 0 and short memory. For example, V ( t ) can 
be an Ornstein-Uhlenbeck process defined by d V (t) = —a V (t) dt+a dB(t), t > 0, 
with initial state V (0) ~ N(0.a/2) that is independent of Brownian motion 
B(t). The stationary covariance function of V (t) is c v ( r) = E[V(t- 1- t) V (f)] = 
(a/2) exp(-a |r|). 

The solution of the Stratonovich equation can be obtained by classical calculus 
and is X(t) = X(0)exp[(c — a 2 /2)t + a W(f)], where W (t) — J ( J V (s) ds is a 
Gaussian process with mean 0 and variance E [VT(f) 2 ] = t — (1 — exp(— a t))/a. 
Figure 5.1 shows samples of W(t) and an estimate of E [ VV'(t) 2 ] obtained from 100 
independent samples of W ( t ) for a — 50. The samples of W (t) for this resemble the 
samples of Brownian motion processes and E [ W(f) 2 ] — t . O 

Proof The process W(t) is Gaussian with mean 0 and covariance function 
E [W(t + r) W (01 = E [W(t) 2 ] + E [ir V ( u ) du J ( j V (v) dv j for r > 0. The 

variance of W(t) is given by ^[^(f) 2 ] = J^ 0 c v (u — v)dudv = 2 fg(t — 
f) c v (ri) dr] by the change of variables £ = v and i] = u — v, and has the stated 
expression for c v (r) = (a/2)exp (— a |r|). Note that /^ 0 c,,(r)dr = 1 for every 
a > 0 so that c v (r) -» S( r) as a — > oo. ▲ 
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5.5. 1.2 Semimartingale White Noise 

Let X(t) be the solution of (5.25) and (5.26), where Y(t) is a semimartingale denoted 
by S(t) . As previously, the integrals fg a (X (s— )) ds and fg b(X(s— )) dS(s— ) in(5.26) 
are Riemann and Ito integrals. We state without proof conditions for the existence 
and uniqueness of the solution of (5.26). 

Theorem 5.11 If 5 is a semimartingale with 5(0) = 0,X(0) is finite and Un- 
measurable, and the function G : [0, oo) x 12 x R — >■ K. is such that (1) the samples 
(t, co ) — * G(t, co, x) are in <5? for a fixed x and (2) \G(t, co, x) — G(t, co, x')\ < 
K(u>)\x — x'\ for each (t, co), where K is a finite-valued random variable, then the 
stochastic integral equation 


X(t) = X(0) + [ G(s,-,X(s-))dS(s) (5.40) 

Jo 

has a unique solution that is a semimartingale ([15], Theorem 6, p. 194). 

An extension of this result is provided by Theorem 5.12 whose statement involves 
the following concepts. Let < 2> d denote the class of R r/ - valued stochastic processes 
whose coordinates are adapted processes with samples in 2 . An operator G : 2> d — > 
3> is said to be functional Lipschitz if, for any processes X, Y e < 2> d , (1) X L ~ = 
Y l ~ implies G(X) /_ = G(Y) r ~ are processes stopped at T — , where T denotes a 
stopping time and (2) there is an increasing finite process K(t), t > 0, such that 
|G(X)(r) - G(Y)(t)\ < K{t) ||X(r) - T(t)|| a.s. for each t > 0 ([15], p. 195). 

Theorem 5.12 If S = (Si, , S,r ) is a vector of semimartingales, 5(0) = 0, J; e 
*2) , i = 1 , . . . , d, and the operators G\ : 2> d —*■ 2>, i = 1 , . . . , d, j = 1 , . . . , d' , 
are functional Lipschitz, then the system of stochastic integral equations 

d' 

Xft) = Jft) + V / G\(X)(s-) dSfis), i=l,...,d, (5.41) 

7=1 Jo 

has a unique solution in 2 d . If the processes Jj are semimartingales, then Xj, i = 
1, . . . , d, are also semimartingales ([15], Theorem 7, p. 197). 

Example 5.21 Let dX(t) — b(X(t—), t)dC(t ) be a stochastic differential equation 
with the integral form 


r t W) 

X(t) = X(0)+ / b(X(s-),s)dC(s)=X(0) + Yb(X(T k -),T k )Y k , 

Jo k= 1 

where C(t ) is a compound Poisson process with jump times (7j, Tj, . . .) and iid 
jumps (Ti, Y 2 , . . .). If X(0) e ■9 r o , t i-> b(x, t) is continuous for each x, and 
x 1 — b(x,t) satisfies a uniform Lipschitz condition for each t > 0. then the above 
equation has a unique solution that is a semimartingale. O 
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Proof Theorem 5.11 guarantees the existence and uniqueness of solution X. The 
second condition on G in this theorem becomes the uniform Lipschitz condition 
\b{t, x) — b(t, x')\<c\x — x'\ for each t > 0 and a constant c > 0 since b is a deter- 
ministic function. Since S = C, the samples of X are constant between consecutive 
jumps of C and have jumps AX(Tk) — X(Tk)—X(Tk—) = b(X{Tk~), Tf) AC(Tk) = 
b{X{Tk— ), Tf) Yk at the jump times of C so that X(t) = X(Tk—) + b(X(Tk—), Tk)Yk, 
t e [Tk, Tk+ \ ) , which gives the stated formula for A. ▲ 

Example 5.22 Let X be the solution of dX(t) = a(X(t—), t) dt + b(X(t — ), t) dS(t), 
where Sis a semimartingale. If X(0) e J^o> the functions 1 1 ->- a(x, t), b(x, t ) are con- 
tinuous for each x, and the functions x i->- a(x, t), b(x, t) satisfy uniform Lipschitz 
conditions for each t > 0, then X is the unique solution of the stochastic integral 
equation 


X(t) = X(0) + 


[a(X(s-),s) b(X(s-), S )] 


ds 

dS(s) 


by Theorem 5.11 since ( s , S(s)) is a semimartingale. O 

Example 5.23 Let X be the process in Example 5.21 and g e C 2 (R) denote a real- 
valued function. We have 

g(X(t)) - g(X( 0)) = f a(X(s—)) g'(X(s-)) ds 
Jo 

+ [ [ [g(X(s-)+yb(X(s-)))-g(X(s-))]^(ds.dy), (5.42) 

Jo Jr 

where ./// is a Poisson random measure on [0, oo) xl, £ [.-# (ds, r/yj] = (7. ds) 
dF(y ), X > 0 is the intensity of the Poisson process N, and F denotes the distribution 
of the iid random variables Yk ([17], Theorem 4.2.2). Note that (5.42) becomes 
g(C(t)) - g(C(0)) = / 0 ' f R [g(C(s— ) + >’) - g(C(j-))] -M (ds, dy) in the special 
case a = 0 and b = 1, a result consistent with Example 5.5. O 

Proof The Ito formula in (5.7) gives 

g(X{t)) - g{xm = [ g'(X(s—)) dX(s) + ] - [ g"(X{s-)) d [ X , Xf (s) 

J 0 + 1 7o+ 

+ X - g (*(*-)) AX(s)] 

0 <s<t 


so that 


g(X(t)) ~ g(X(0)) = 


g (X (s -))a(X{s-)) ds + ^ [g(X(j)) - g(X(s-))]. 

0 <s<t 

(5.43) 
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since AX(s) = b(X(s—))AC(s), [X, X] c = 0, and / 0 ? + gf(X(s—)) b(X(s—)) dC(s) = 
Y.v <s < t g'{X{s~))b{X{s-))AC{s) = Zo < J <,g'(X(j-))M( S ). This formula, 
the integral form / 0 ' f R [g(X(s-) + y b(X(s-)) - g(X(s-))] J£ (, ds , dy) of X 0<s < ( 
[g(X(s)) - g(X(s-))] , and g(X(s) = g(X(s - ) + b(X(s- ) AC(s )) give (5.42). ▲ 

Example 5.24 Let X(t) be the solution of dX(t) = —aX(t—)dt + dC(t), t > 0, 
starting at X(0) = Xq, where a > 0 is a constant, C(t) — i s a compound 

Poisson process, N(t) denotes a Poisson process with intensity /. > 0, and {!&} are 
iid random variables. It is assumed that Xq is independent of C (?) and that Xq and Y \ 
have finite moments of any order. The moments /i(q: t) — E \X(t) q ] of X (?) satisfy 
the ordinary differential equation 

9 , 

= -a q p(q; t) + A. ^ q ' p,(q - k; t) E[y\ ], t > 0, (5.44) 

, , W k)\ 


with initial condition /i(q: 0) = E [ ] and the convention //(r; 1 ) = 0 for r < 0, 
where q > 1 is an integer. <> 

Proof The version of Ito’s formula in (5.43) with g(.x) = x q gives 


X(t) q - X{0) q = —a q f X(s-) q ds + V [(X(s-) + AX(s)) q - A'(s-)' / ] 

J 0 + 


= —a q f X(s—) q ds+ V' [ V' 
J °+ 


k\ (q - k)\ 


X(s-) q ~ k AX(s) 


*]■ 


(5.45) 


The expectation of the left side of this equation is /i(q: t) — /i(q: 0). The expectation 
of the first term on the right side is —a q J ( | + /i(q: s)ds by Fubini’s theorem since X 
is integrable and measurable in both arguments. We have E [X(s— ) 9 ] = E [A r (s) l? ] = 
n(q; s ) since X (.v—) = X(s) with probability 1 at an arbitrary but fixed .v. The latter 
statement holds since X(s-) = lim„j s X(u) differs from X (,v) if and only if C has 
a jump at s. However, the probability that C has at least a jump in (u, ,v] , u < s, 
is P(N(s — u) > 1) = 1 — exp(— A.(s — w)) and approaches 0 as u f s implying 
that X(s—) and X (s) can only differ on a set of measure zero. The derivatives with 
respect to t of the terms on the left side and the first term on the right side of (5.45) 
are f(q; t ) and —a q p.(q\ t ). 

Consider now the second term on the right side of (5.45). The expectation of the 
increment of this term in the time interval (t , t + At] is 
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r oo , q 

=* z(z 

L n=0 V jt= 1 


k\ (q — k)\ 


X(T n -) q ~ k AX(T n y 


) 


N(At) = n 


P(N(At ) = n). 


where {T n } denote jump times of C in the time interval (t, t + At] and P(N(At ) = 
n ) = (a At) n exp (— a At)/n\, n > 0. The above expectations can be calculated 
term by term since C has a finite number of jumps in any finite time interval with 
probability 1 . Under the assumption that X has finite moments of any order, the term 
corresponding to n = 0 is zero since AX(s) = 0 for all .v e (t. t + At] and the terms 
corresponding to n > 1 are of order 0(At) n , so that the term 


q. 

^ ^k\(q-k)\ 

■- t<s<t+At L jt=l ^ ' 


X{s-) q ~ k AX{s) k 


=z 


^k\(q-k)\ 


=z 


^k\(q-k)\ 


E[X(T 1 -) q ~ k AX(Ti) k ]X At exp(-X At) + 0(At ) 2 


fu,(q — k\ t) £[t[] X At ex p(— A. At) + O(At) 2 


scaled by At converges to the second term on the right side of (5.44) as At -> 0. ▲ 


5.5.2 Tanaka’s Formula 


Ito’s formula cannot be applied to B \B\ since this mapping is not in C 2 (R). 
Mappings of these type are needed to solve locally a class of deterministic partial 
differential equations with mixed boundary conditions (Sect. 5.5.3). The extension 
of Ito’s formula to the mapping B i-a \B\ is referred to as Tanaka’s formula. 


Definition 5.5 The local time process is 
1 

L(t) = lim — / 1 (— s < B(s) < s ) ds, 
£ |0 2e Jo 


where the limit exists in L 2 and a.s. ([3], Chap. 7). 


(5.46) 


Theorem 5.13 (Tanaka’s formula) if B is a Brownian motion, then 

\B(t)\ — |£(0)| = / sign (B(s))dB(s) + lim — / 1 (— e < B(s) < s) ds 
Jo <40 2s Jo 


= B(t) + L(t) a.s., 


(5.47) 
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where the limit is in L 2 , B is a Brownian motion, and sign(x) is —1 ,0, and 1 for 
x < 0, x — 0, and x > 0, respectively. 

Proof For a proof of Tanaka’s formula see [3] (Sect. 7.3) and [12] (Sect. 8.6). 
We only note that \B\ is a semimartingale since (1) L is a continuous increasing 
process that is adapted to the natural filtration of B and (2) B is a square inte- 
grate martingale with continuous samples and quadratic variation [/?, Z?] (?) = 
j' (sign (50?)))“ d [ B. B ] (v) = t a.s. Hence, B is a Brownian motion (Example 5.8). 
Note that formal application of Ito’s formula to the mapping B i->- g{B) = \B\ gives 

sign (B(s))dB(s)+ f S(B(s))ds 

Jo 

since g'(x) = sign(x) and g"(x) = 2 <5(x). The first integral in this equation is B(t) 
in Tanaka’s formula while the second integral corresponds to L{t). ▲ 

Following are two extensions of Tanaka’s formula. The first and second extensions 
involve Brownian motion reflected at zero and at two thresholds, respectively. 

Theorem 5.14 If g e C 2 (R) andL(t) — g' (\B(s)\) dL(s) = gf0)L(t), then 


\B{ t )\ - iz?(0) i = r 

Jo 


gmt)\)-gmo)\) 


= / g'ms)\)sign(B(s))dB(s)+- / g"(\B(s)\)ds + L(t). 
. o 2- . o 


(5.48) 


Proof Since g e C 2 (R) and the K 2 -valued process X = (B. L) is a semimartingale, 
the Ito formula in (5.15) applied to mapping X g(X) = g(B + L) gives 


g([B(f)|)-g(|Z?(0)|)= / g'(\B(s)\)d(B + L)(s)+- / g"(|BH)|)* 

JO 2 J 0 

= [ g'ms)\)dB(s)+l I g"(\B(s)\)ds+ [ g'(\B(s)\)dL(s) 

Jo 2 Jo Jo 

= f g’(\B{s)\)dB{s)+ l - [‘ g"ms)\)ds + g'(0)L(t), 

Jo 2 Jo 


which is the formula in (5.48). An alternative proof is in [7] (Sect. 6.2.3. 1). ▲ 

Example 5.25 Let u be the solution of Schrodinger equation ( 1 /2) u"{x) + q u(x) = 
0, x e D — (0, Z), with the Dirichlet and Neumann boundary conditions n'(0) = a 
and u(l) — ff where a , ft are some constants. The local solution of this equation is 


u(x) = ft E x \e q r ] — a 


e qs dL(s ) 


where T = inf {f > 0 : \B(t)\ f D }, B( 0) — x e D. and E x \-\ = £[• | B( 0) = x]. 
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The exact solution of the Schrodinger equation is 

P — (a/ ~/2q) s,m(y/2q /) r— a / — 

u(x) = . 7=7- cos(f2qx) + — = stn( v /2 qx) 

cos(y/2 ql) ■ s /2q 

for q > 0. The local solution of this equation is based on an Revalued process X 
with coordinates X\ = \B\ and X 2 defined by dX 2 (t) = qX 2 (t) dt with X 2 (0) = 1. 
Estimates of u for a = 1, ft = 2, q — 1, and 1=1 based on n = 500 independent 
samples of X generated at a time step of At = 0.0005 are in error by less than 2%. 
The error has been calculated with respect to the exact solution. O 

Proof Ito’s formula applied to mapping X m- u(\B(t)\) X 2 U) = u(B(t) + L(t))X 2 (t) 
gives 


u(\B(t)\)X 2 (t) - u(x) = f u\B(s) + L(s)) X 2 (s) d(B(s) + L(s)) 

Jo 

+ [ u(B(s) + L(s )) dX 2 {s ) + ~ f u"(B(s) + L(s)) X 2 (s) ds, 

Jo 2 . n 


so that 


E x [u{\B{T)\)X 2 (T)]-u{x)=E x 

[f 


q E x 


u(\B(s)\) X 2 (s) ds 


- E x 
2 


u (| B(s)\)X 2 (s)dL(s) 

if 


u"(\B(s)\)X 2 (s)ds 


by averaging and using Tin place of t ([14], Chap. 9). Since fi(.y)| is in D for s < T, 
then (\ /2)u" (\B(s)\) + qu(\B(s)\) = 0 so that 


E x \u(\B(T)\X 2 (T))\ - u(x) = u' (0) E x 


X 2 (s) dL{s) 


which gives u(x) since w(|B(r)|) = ft, X 2 {s) = e qs , and u' (ft) = ot. ▲ 

Let r be a periodic function with period 2 (b—a), a < b. defined by r(x) = \x— a\ 
for \x — a\ < b — a. The process r(B) has the range [0, b — a], and is referred 
to as Brownian motion reflected at two thresholds. Let x^ia) = a + 2 k (b — a) 
and x/fb) = b + 2k (b — a), k e Z, be the collection of points on the real line 
where r(x) is equal to 0 and b — a , respectively. Lor e e (0, (b — a)/ 2) define the 
intervals = {x : \x — .r/t(a)| < e} and 4 i£ (T>) = {x : |x — x>(£>) | < e} 

centered on xjfa) and xjfb), respectively, and set 1(a) = U?l_ (x£(a), x/fb)) and 
I(b) = U^_ oo (x k (b),x k+] (a)). 
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Theorem 5.15 If g e C 2 (R), then 

g(r(B(t))) - g(r(B(0))) = f g'(r(B(s))) (l (B(s) e 1(a)) - 1 (B(s) e 1(b)) dB(s) 
Jo 

°° i 1 r< 

+ X! [g(0) T(0 -Ft (a)) - - a) T(f; JCjt(&))] + ~ / g"(r(B(s))) ds, 

k =- oo 2 ^ 

(5.49) 

where L(t; xifa)) = lim £ 4.o(l/2e) f 0 l(B(s) e 4 i£ (a)) ds and L(t; x/fb)) is L(t; 
Xk(a)) with Ik,s(b) in place ofIk, £ (a). 

Proof For proof see [7] (Sect. 6.23.2). Tanaka’s formula and its previous extension 
cannot be used to prove (5.49) since they deal with Brownian motion processes 
reflected at zero. A 


5.5.3 Random Walk Method 


We show that solutions of some deterministic partial differential equations can be 
obtained at arbitrary points in their domains of definition directly, rather than extract- 
ing them from field solutions. The method delivering these local solutions uses sam- 
ples of diffusion processes, and is referred to as the random walk method. 

The class of partial differential equations admitting local solutions has the form 


3 u(x, t) 
dt~ 


d 

!=1 


3 u(x, t) 
3 Xj 


1 

2 


Z #/•(*) 


ij= 1 


3 2 u(x, t) 
3 x, 3 xj 


+ q(x, t) u(x, t) + p(x, t), (x, t) e D x (0, oo), 


(5.50) 


where D is an open subset of W 1 . «/. fr tj are real-valued functions defined on 
D C Sf.d>\ is an integer, and q, p denote real-valued functions defined on 
D x [0, oo). The solution u : I) x (0. oo) — > R of (5.50) depends on boundary 
and initial conditions. If q and p depend on only x and du/dt = 0, (5.50) is a partial 
differential equations involving only spatial coordinates. Note that (5.50) can be a 
Poisson equation, heat or any other transport equation, or a Schrodinger equation 
depending on the coefficients a,(x) and fiij(x) and the functions q(x, t) and p(x, t). 


5.5.3.1 Dirichlet Boundary Value Problem ( q = 0) 

Let u(x, t) be the solution of (5.50) with q = 0, satisfying the initial and boundary 
conditions u(x, 0) = rj(x), x e D, and u(x, t) — £(x, f), x e 3 D, t > 0. 

It is assumed that (1) the matrix ji(x) = {fij(x)\ in (5.50) is symmetric and 
positive definite admitting the representation f(x) = b(x) b(x)' , (2) the stochastic 
differential equation 
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dX(s ) = a(X(s )) ds + b(X(s)) dB(s) 

dX c i+i(s) = —ds 
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(5.51) 


has a unique strong solution X(s) = (X(s), Xq + \ (.s)) eM rf+1 , s > 0, where B denotes 
an RA- valued Brownian motion, afx) = a-fx), i — \ ..... d. are given by (5.50), 
and b(x ) is in the representation of fix). and (3) the boundaries of D are regular, the 
partial derivatives of u in (5.50) are bounded in D, = D x (0, t), t > 0, the functions 
il and £ are continuous in their arguments, and p is Holder continuous in D. that is, 
there exist constants c, a e (0, oo) such that | p(x, t)—p(x', t)\ < c ||x — x'||“, where 
x,x' eD([ 4],p. 133). 

Definition 5.6 The generator of the diffusion process X = (X . X f j + ] ) in (5.51) is 
the limit 


g/ [g(x, r)l = lim 

5^0 


E (XJ) [g(X(s))] - g(x, t) 


(5.52) 


where g is a real- valued function defined on R' /+ 1 with continuous second order 
partial derivatives in x e W 1 and continuous first order partial derivative in s > 0 
and Zs (v ’0 denotes expectation conditional on X (0) = (x, t). x e D and t > 0. 

Theorem 5.16 Under the stated assumptions, the local solution of (5.50) with q — 0 
is 


i(x, t) = E (x ’ r) [«(X(f))] 


rU.t) 


p(X(s)) ds 


- 


■ r) [jj(X(0) I T = t] P(T = t) + £ (t ’ f) [^(Z(r - T)) \ T < f] P(T < t) 


A x ’t) 


p(X(s)) ds 


(5.53) 


where X = (X, Xq + \) and T = inf{s > 0 : X(i) f D t , ref), t > 0). 

Proof We present the main idea of the proof. Technical arguments needed to support 
some of the statements in our discussion can be found in [3] (Chap. 6), [4] (Chap. 4), 
[7] (Sect. 6.2), and [14] (Chap. 9). 

The Ito formula in (5.15) applied to the mapping X \-x g(X) gives 


d~\- 1 |- r. 

l ’ f) [g(if(.v))] - g(X(0)) = J o 


dg(X(o)) 
3 x; 


dXj(o) 


1 x - 


(x.t) 


i.j= 1 


A 3 2 g(X(o)) 

b(X(o )) b(X(a )) ) 5. . da 

) u oxjdxj 


by averaging, where g is as in (5.52), x; is the coordinate i — \ ..... d. d + I of 
x = (x, x f j + 1 ) , and = x, for i = 1 , ,d. An alternative form of this equation is 
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E (xJ) [*(*(*))] - g(x, t ) = E (x - ! > 



3g(X(a)) 

dxj 


a i(X(a )) 


9g(X(cr )) \ ^ 
dxd+i ) 




(5.54) 


since the processes J Q S |^3g(A r (a))/3x/J bjj(X(o)) dBj(o), i,j = 1, . . . , d, are mar- 
tingales starting at zero. For a small s > 0, the argument of the expectation on the 
right side of the above equation divided by s can be approximated by 


z 


dg(X(6(co)s , co)) 
dxj 


ai(X(6(co) s, co)) 


dg(X(9(co)s, co)) 
dxd+\ 


1 



d 

^ (j?(X(9(co)s, co)) b(X(9(co) s, co)) T ^ 
ij = 1 


3 2 g(X(0(a>)s, co)) 
dxjdxj 


for each co e £2 , where 0 (co) e [0, 1] . Since the drift and diffusion coefficients 
of X satisfy the conditions in Theorem 5.7, X has continuous sample paths, and g 
has continuous partial derivatives of order 1 and order 2 with respect to Xd+\ and 
x i, i — 1, . . . , d, respectively, the limit of the above function as .v ], 0 gives 


3 JL 3 1 JL t 3 

2>^ + 2 2> r > 


3Xd-\-l 


i,j= 1 


IJ 3 Xi 3 xj 


(5.55) 


so that 


[^(X-(j))] - g(x, t) 






[g(*(<0)] 


dc T 


(5.56) 


Recall that calculations similar to those used here to establish the generator sf of X 
were performed in Example 5.7 to show that the generator of a Brownian motion is 
proportional to the Laplace operator. 

Note that (5.56) holds with u in place of g and that the local solution in (5.53) 
results from (5.56) for s = T since X(s) e I), for s < T and X(T) is on the boundary 
of D, so that g/ |^M(X(i))j = —p(X(s)) and u(X(T)) = £( X(t — f)) for f < t and 

rj(X(t)) for T — t. That (5.56) can be used for ,v = T follows from the Dynkin 
formula ([14], Theorem 7.4.1). A 

Theorem 5.17 Under the conditions in Theorem 5.16, the local solution of the time 
invariant version of (5.50) with q = 0 and u(x) = §(x), x e 3 D, is 


u(x)=E x [H(X(T))] + E x 


p(X(o)) do 


(5.57) 
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where X is an R d -valued diffusion process satisfying the first d equations in (5.51) 
and T = inf {.y > 0 : X{s) <£D}. 

Proof Arguments similar to those used to derive (5.53) can be used to find (5.57). 
The generator of X is 


d 

srf = y fl,(x) 

7=1 


9 

3 xi 


1 ^ 

2 T. (b(x) b(x)')ij 
ij= 1 


9 2 

dxj dxj 


so that 


E x [*(*(*))] - g(x) = 


srf[g(X(o))\ da 


(5.58) 


(5.59) 


by Ito’s formula applied to g(X) for s e [0, T). k 

Estimates of solutions u(x, t ) and u(x) can be calculated from (5.53) and (5.57) 
using samples of X and X, respectively. The estimates are satisfactory if the sample 
size is sufficiently large and the numerical scheme used to generate samples of X 
and X is accurate. The accuracy of this scheme depends strongly on the time step At 
that needs to be related to the gradient of the coefficients in (5.50), as illustrated by 
numerical experiments in [9]. 

Example 5.26 The local solution for 


2 

y.ttj(*) 

7=1 


du(x) 

dxi 


1 

2 


2 

Z /¥*) 


>j = i 


9 2 m(x) 

dxjdxj 


= 0, 


x e D c M 2 , 


with the boundary condition u(x) = f(x),x £ dD,isu(x) = E x [M(A r (7’))]by(5.57), 
where X is an Revalued diffusion process defined by the first d = 2 equations in 
(5.51), T is as in Theorem 5.17 such that E x [7] < oo, and X(0) = x e D. 

Estimates of u(x) have been calculated for D = (— 2, 2) x (— 1, l),£(x) = l,x e 
(—2, 2) x { — 1}, and §(x) = 0 on the other boundaries of D, oq (x) = 1, ajix) = 2, 
/) 1 1 (x ) = 1.25, (x) = @21 (x) = 1.5, and f$ 22 (x) = 4.25. The estimates are 

based on 1000 independent samples of X defining by 

[ dXi(t) = dt + dBft) + (1/2 )dB 2 (t), 

[ dX 2 (t) = 2 dt + (1/2) dB\{t) + 2 dB 2 (t), 

and generated with a time step of At = 0.001, where B\ and If are independent 
Brownian motions. They are 0.2720, 0.2650, and 0.2222 at x = (0.5, 0), (1.0, 0), 
and (1.5, 0), and differ from the finite element solution by less than 5%. O 

Example 5.27 Consider a bar in torsion with a multiply connected cross section D, 
external boundary 90 1 , and interior boundaries dD r , r = 2, ... ,m, delineating 
cavities. The Prandtl stress function m(x) for this bar is the solution of Ait(x) = 
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—2G9, x e D C R 2 , where G is the modulus of elasticity in shear and 6 denotes 
the angle of twist. 

The local solution is 

u(x) = E x [u(B(T))\ - - E x 


Au(B(s)) ds 


= ^ k r P(B(T) € 9 D r ) + Gf E x \T\ 


for the boundary conditions w(x) = k r , x e 8D r , r = 1, . . . , m, where k r are 
constants. One of the constants k r is arbitrary and can be set to zero, for example 
k{ = 0 ([1], Sect. 7-6). If the Brownian motion B starts at a point on the boundary 
of D, for example, x e dD r , and D is regular, then P(B(T) e dD r ) = 1, P(B(T) e 
dDq ) = 0, q f r, and E x [7’ | = 0 so that u (x ) = k r , that is, the local solution 
satisfies the boundary conditions exactly [5, 6]. 

If D is a simply connected set, then p,(x) = 0 for r — 2, so that 

u(x) — GO E x [7’] . This solution has been applied to an elliptical cross section 
D = {x : (xi/oi) 2 + (x 2 /« 2 ) 2 < 1} with a\ =5, a 2 = 3, and GO = 1/2. For 
100 samples of B generated with a time step At = 0.001 the largest error of the 
estimated Prandtl function is nearly 33%. The error decreases to 2% if the sample 
size is increased to 1000. Errors have been calculated relative to the exact solution 
m(x) = —a\ «2 G P (x^/aj + x|/a 2 — l)/(2 (a 2 + a^)) ([1], Sect. 7-4). O 

Example 5.28 Let u satisfy the heat equation 


d 2 

du(x,t) 9 9 u(x,t) 

= a 7 -« — 

9 1 — 9x? 

1=1 1 


(5.60) 


with initial and boundary conditions giving the functions u{x, 0) for x eD and m(x, t) 
for x e 9 1) and t > 0. respectively, where D is an open bounded set in W 1 and a > 0 
is a constant. The local solution of (5.60) is 

u(x,t) = E (xJ) [u(X(f))], (5.61) 

where dX,(s) = ~/2 a dBj(s), i = 1, . . . , d and dX^+iis) — — ds by (5.51). 

Proof Note that (5.60) is a special case of (5.50) with a, = 0, fiy = 2 a 2 Sy, q = 0, 
and p = 0. The generator of X is srf = —d/dxd+i + or A and the steady-state 
solution u s (x) = I i 111 ,^. 00 , nix, t ) satisfies the Laplace equation Au s — 0. ▲ 


5.53.2 Dirichlet Boundary Value Problem ( q ^ 0) 

The local solution for this general form of (5.50) involves the Feynman-Kac func- 
tional. Results in the previous section can be viewed as a special case of those 
discussed here. 

Theorem 5.18 Let p , £ : R be functions such that p has a compact support 

and continuous second order partial derivatives and £ is a bounded Bore l measurable 
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function. Let X be an W 1 -valued diffusion process defined by dX(t) = a(X(t)) dt + 
b(X(t ) dB(t)), t > 0, that has a unique solution that belongs a.s. to the support of 
p. The Feynman- Kac functional. 


v(x, t) = E x 


p(X(t)) e J 


(5.62) 


is the solution of the partial differential equation 

^ [v] + (5.63) 

at 

with v(x, 0) = p(x) and some boundary conditions, where sZ denotes the generator 
ofX(t). 

Proof A sketch of the proof can be found in [7] (Sect. 6.2.2). Additional technical 
considerations are in [14] (Theorem 8.2.1). A 

Theorem 5.19 The local solution of (5.50) is 


u(x, t ) 


_ fAO 


M(A _ (r))expj / q(X(cr))do 


r (AO 


/?(X(.?)) expj / q(X(a)) da ) ds 


(5.64) 


for x e D and t > 0, where X is given by (5.51), f is as in (5.53), and p, £, and q 
are such that (5.62) and (5.63) apply. 

Proof Consider an M rf+2 -valued diffusion process (X, Xd+ 2 ) = (Z, A)/+i, Z </+2 = 
Z) defined by the stochastic differential equation 

dX (s') — a(X(s)) ds + b(X(s)) dB(s), 

dXd+\(s) — -ds, 

dZ(s) = q(X(s), s) Z(s) ds. 


for .? > 0 with initial conditions X(0) = x e D, A )/ + 1 (0 ) = t > 0, and Xd +2 (0) = 
Z( 0) = 1. The generator of (X , Z) = (X, X r j+ \ , Xd +2 — Z) is 


si* = -- 


d*d+\ “j' dx, 


^ d d 


dxd +2 2 


\Y. {bbT k 


''.;=i 


dxi dxj ’ 


so that 


sf* [g(x,s,z)] =z 


d fk d 

h > ,aj(x) h q(x, s) 

axd+l dxj 


1 d 

« ^ (b(x) b(x) T ) 


ij= 1 


y dx; dx; 


f(x,Xd+ 1). 
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for g(x, Xd+ 1, Xd+2 — z) — ir(x, Xd+{)z assumed to have continuous first order 
partial derivatives in .v and z and continuous second order partial derivatives in x. 
For g(X, Z) = u(X) Z, the Ito formula gives 


E (x ' t] \u(X(T))Z(T)\ - m(X(0))Z(0) = E (x ’ t] 


sZ* [w(X(i))Z(s)] 


ds 


by averaging. This formula yields (5.64) since (X, Z) starts at (x, t, 1 ) so that 
m(Z(0))Z(0) = u(x, t), X(i') e D, fori e (0, f), and 

J sZ* j^M(Z(i))Z(i)j ds = — J p(X(s), s) Z(s) ds 

for any (x, t) in D x (0, oo). Technicalities on these derivatives can be found in [3] 
(Sect. 6.4). A 

Example 5.29 Consider the special case of the time-invariant Schrodinger equation 
obtained from (5.50) with q(x ) = q, p(x) = 0, a,- = 0, and fiij = 2 8jj. The local 
solution for the boundary condition u(x) = ^(x) is 


n(x) = E x 


|(V2 B(T))e qT 


X G D. 


where B is an R^-valucd Brownian motion starting at x e D and T = infjt > 
0 : s/2 B(t) qL D, x e D) is a stopping time. Estimates of n(x) for D = {x e 
R 2 : x\ + x\ < 4}, q = — X?=i a h anc ^ = exp(ao + a\ x\ + 02 x 2 ) with 
a 0 = 0.5, a 1 = 0.3, and a 2 = 0.3 are in error relative to exact solution m(x) = 
exp(ao + a\x\ + 02x2) by less then 1.5% if based on 500 independent samples of B 
generated at a time step At = 0.001. O 


5.5.3.3 Mixed Boundary Value Problem 

Local solutions for partial differential equations with Neumann and Dirichlet bound- 
ary conditions use diffusion processes similar to those for Dirichlet boundary value 
problems in the interior of their domain of definition D. However, the processes 
solving mixed boundary value problems need to be reflected back in D when they 
reach Neumann boundaries since the solution is not specified on these boundaries. 

We have already applied the random walk method in Example 5.25 to solve an ele- 
mentary mixed boundary value problem locally. The following example applies this 
method to calculate the effective conductivity for a heterogeneous material specimen. 

Example 5.30 The two-dimensional rectangular specimen in Fig. 5.2 is subjected to 
a voltage source with unit intensity. The potential <p(x ) in the specimen satisfies the 
partial differential equation 
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Fig. 5.2 Two-dimensional 

X2 


heterogeneous specimen 



with unit thickness 

dcp/dx 2 = 0 



H(xi i X 2 ) = conductivity 


0 = 0 


<P=1 


d(p/dx2 = 0 

1 x 1 


^ 3E(x) 3(b (x) . , 

+ E(x)A<p(x) = V • (E(x)Xcp(x)) = °, xeD, (5.65) 

p=l dx p 3 x p 

in D = (0, Zi) x (0, h) with the boundary conditions <p (0, X 2 ) = Oand0(Zi, xz) — 1 
for X 2 e (0, h), and 3<p(x\, X 2 )/dx 2 = 0 on (0, l\) x {0} and (0, l\) x {Z 2 }, where 
E (x) > 0, x e D, denotes material conductivity. It is assumed that the conductivity 
field E is deterministic and has continuous first order partial derivatives. 

Analytical solutions for (5.65) are available in few cases. For example, if E (x) = 
E is a constant, that is, the material is homogeneous, the potential satisfies the 
differential equation Acp(x) = 0, x e D, so that (p(x) = xi/Zi, x e D. The current 
passing through the specimen in Fig. 5.2 is 

f 3</>(x) 

/heter = / E(x) — dx 2 , x\ e [0, l\] , (5.66) 

J{x i)x(o,; 2 ) dx i 

for a heterogeneous specimen with conductivity E(x) and /homog = E h/h for a 
homogeneous specimen with a constant conductivity E. The effective or apparent 
conductivity 27 e ff for the heterogeneous specimen is the conductivity of a virtual 
homogeneous specimen with conductivity such that /heter = /homog, which gives 

Zi f 8<p(x) 

EtS — j- / E(x) -ZE-L dx 2 , xi e [0, h \ . (5.67) 

*2 /{*i}x( 0,fa) "Xi 

Since effective conductivity E e ff depends on the gradient tUp /dx\ of (p on a line 
{xi} x (0, h), can be calculated approximately from 

Zi 1 — 0(Zi - t,X 2 k) 

E eS ~ 1 Ax 2 V E(h - ?, jc 2 ,i) , (5-68) 

/2 ^ * 


where X 2 ,£ = (Zt + 1/2) Ax 2 , k = 0, 1, . . . , ri 2 — 1, Ax 2 = Z 2 /H 2 , «2 > 1 is an 
integer, and 0 < / l\ is a constant. The random walk method can be used to 
estimate cp(l\ — £, X 2 ,k)- 

Let X be an Revalued process with coordinates X\ — X[ and Xi — /'(Ag), where 
X — (Xi , X 2 ) is diffusion process defined by the stochastic differential equation 
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Fig. 5.3 Function r(x 2 ) in 
the range [—3, 3] for I 2 = 1 



0 L 

-3 


-2 


-1 


0 


2 


3 


clX\(t) = a\ (X(t)) dt + b(X(t)) dB\ (t) 
dX 2 (t) = ai(X(t)) dt + b(X(t)) dBiit), 


(5.69) 


where cik(x ) = dE(x)/d x^, k = 1,2, b(x) = (2 E(x)) 1 / 2 , and r : R. — > M 
is a periodic function with period 2h defined by r(x 2) = \x 2 \ for X 2 e [— Z2, Z2] ■ 
Figure 5.3 shows r(jC2) in the range [—3, 3] for Z2 = 1. The generator of X coincides 
with that of X in D, and matches the differential operator in (5.65). The essential 
difference between X and X relates to their behavior at the Neumann boundaries. 
X can exit these boundaries while X is reflected back in D so that it can exist only 
through the Dirichlet boundaries of D. The local solution for (5.65) is 


4>(x) = E x [4>(X(T*))]. xeD, 


(5.70) 


where T* = inf {t > 0 : X(t) £ D, X(0) = x e D}. 

Estimates for i7 e ff have been obtained from (5.68) with </;(/] — £, X2,k) calculated 
from (5.70) for a material specimen in D — (0, 10) x (0, 4) with conductivity field 



(5.71) 


for q\ = 3 and <72 = 2. The estimates can be unsatisfactory if the time step At 
used to generate samples of X is relatively large, for example, 27 e ff = 2.73 for 
f = 0.1, At = 0.001, ri 2 = 5, and n — 1000, an error of +36% with respect to 
the finite element solution. The estimates of improve by decreasing the time 
step At and/or increasing nj, for example, = 2.09 (error +4.6%) for £ = 
0.1, At = 0.0001, «2 = 5, and n = 1000, and Eeff = 1.9619 (error —1.91%) 
for £ = 0.05, At = 0.00001, n 2 = 20, and n = 1000. A useful discussion on the 
construction of the local solution for (5.65) when E (x) has discontinuities can be 
found in [8]. <> 
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5.5.4 Girsanov’s Theorem 


Prior to stating and proving Girsanov’s theorem, we define an exponential process, 
give properties of this process, and discuss transformations of probability measures. 
The presentation is based on developments in [12] (Sects. 8.7, 8.8, 8.9). 

Let B be a Brownian motion on a probability space (Q . J 5 / P) and let Y be 
a member of the space J4?[0, r] of real-valued processes that are adapted to the 
Brownian filtration = cr(B(s) : 0 < s < t) and are AS [0, r] x ^/-measurable 
such thatP(/ 0 T Y(t) 2 dt < oo) = 1 (Sect. 4.4). 

Definition 5.7 The process 


Z{t) = exp 


Y(s)dB(s)-- 


Y(sfi ds 


t e [0, r] . 


(5.72) 


with Y <= AYS' [0, r] is called exponential process. 

Example 5.31 Consider the special case of (5.72) with h e Lr [0, r] in place of Y. 
Then 


Z(t) = exp [V (f)] = 


exp( / 0 r h(s) dB(s )) 
£[ ex p( /,' h(s) dB(s))] 


, where 


V{t) = 


h(s ) dB(s) - - 


h(s) z ds. 


(5.73) 


Alternative definitions of this process are dZ(t) = h(t) Z(t) dB(t) with Z(0) = 1 and 
Z(t) = 1 + Jq V(s) dB(s), and show that Z(t) is an ^-martingale. <> 

Proof The expression of Z(t) in (5.73) holds since J 0 r h(s) dB{s) is a Gaussian vari- 
able with mean 0 and variance h(s) 2 ds. Ito’s formula applied to the mapping 
V(t) i — > Z(r) = exp(T(t)) gives dZ{t) = Z(t)dV(t) + (1/2 )Z(t)d[V,V](t) so 
that dZ(t ) = h(t)Z(t) dB(t) since d[V, V] (?) = h{t) 2 dt. The integral form of 
this equation is Z(f) — 1 = JJf h(s) Z(s) dB(s) since Z(0) = 1 so that Z(f) is an 
-martingale in [0, r ] by Theorem 4.4. A 

Theorem5.20 If Y and Z are as in Definition 5.7 and E [Z(t)} = Iforallt e [0, r] , 
then Z(t) is an -martingale in [0, r] . 

Proof Note first that E [Z(t)] = E [Z(0)] = 1 for all t e [0, r] if Z(t ) is the process 
in Example 5.31. The condition E [Z(r)] = 1 is difficult to verify for Z(f) in (5.72). 
It is preferable to use the requirement /f[exp( ff Y(t) 2 dt/ 2)] < oo, referred to as 
the Novikov condition, that is stronger than E[Z(t)\= 1, t e [0, r | . 

The set function Q defined by d(Q = Z(r) dP, or its integral form Q(A) = 
j A Z(t) dP is a probability measure since it is countably additive and Q(P2) = 
Z(r) dP — E [Z( r)] = 1 by assumption. Also, the probability measures Q and P 
are equivalent since Q(A ) = 0 if and only if P(A) = 0, A e fp . 
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It can be shown that Q, defined by d(Q r = Z(t ) dP is the restriction of Q to .'¥ t . 
For s < t and A e dF s arbitrary, we have f A Ep [Z(t) \ J^,] dP = f A Z(t)dP = 
J A dQ t = Q(A ) by properties of the conditional expectation and the definition of 
Qr, which implies Ep [ Zit ) | & s ] = Z(s) since Q(A) = J A dQ s = J A Z(s)dP. We 
have Ep [Z(t) | | = Z(s) for A e arbitrary, E [Z(r)] exists and is finite, and 

Z(t) e i so that Z(t) is an P-martingale in [0, r] . A 

Theorem 5.21 LetB(t), 0 <t< 1, be a Brownian motion with respect to a probabil- 
ity measure P and let Q be defined by dQ = exp(/i( 1)— 1 /2) dP.ThenM(t) = B(t)—t 
is a Brownian motion with respect to Q and 


[ h(B(t) — t)dQ= [ h(B(t))dP , (5.74) 

J Q J £2 

or, equivalently Eq [h(B(t) — t)] = Ep [h(B(t ))] , where h is a function such that 
Ep [|/i(fl(f))|] < oo. 

Proof That Q is a probability measure results directly from its definition. The process 
M(t ) = Bit) — t is a Brownian motion under Q since it starts at zero, has continuous 
samples, and has stationary Gaussian increments. That M(t) — M(s) ~ 77(0, t — 

s), t > 5 , follows from _ e -iu(t-s) E ^ e iu(B(t)-B(s)y^ anc | 

E Q [ e iuim-m)] = Ep yiu(B(t)-B( S )) dQ / dp ] = £/j | - e iu(B(,)-B(s)) e B(D-l/2] 

= E P {Ep[e iu{m ~ B(s)) e B(l) ~ l/2 \ &,]) = E P {e iu(Bin ~ B(s)) E P [e B(1> ~ 1/2 \ &,]} 
= E P {e iu(B(,) ~ B(s)) e m ~ tl2 } = e “ f/2 E P {e {iu+l) WO-SCOH^)} 

= e~ t, 2 E P {e {iu+X)m) ~ m) }Ep{e B(s) } = e (-“ 2 + 2i «) (»-*)/2 > 


so that E Q [e iu(M(,) - M{s)) ] = exp (-u 2 ( t-s)/2 ), thatis,M(f)-MG) ~ 77(0, t-s ) 
under Q. These calculations have used the fact that exp(B(r) — t/2) is an jF f - 
martingale (Exercise 5.14). Similar arguments show that M (t) has independent incre- 
ments. We conclude that M is a Brownian motion with respect to Q. 

To prove (5.74), note that for an arbitrary deterministic function a(t) we have 


E P [h(B(t ) - ait)) ^< 0 - 1 / 2 ] = Ep{E P [h(B(t ) - a(t)) e B(l) ~ ‘/ 2 | &,]} 

= E P {h(B(t) - a(t))E P [e B(l) ~ 1/2 \ &,]} = E P {h(B(t) - «(?)) e m ~ ,/2 } 

/ OO ~ 

h(x - a(t))e x ~ ,/2 cp(x/-f~t)/V~tdx = E P {h(B(t)) e (t ~ a(,)) B{!)/, ~ (t ~ a(,)) /(2 °}, 

-oo 


where the latter equality follows by using the change of variables y = x — ait). 
Formula (5.74) results by setting a(t) = t. A 

Theorem 5.22 (Girsanov’s theorem) Let (72, J?,P) be a probability space, Y e 
Jff [0, r] , and Z(t) as in Theorem5.20. Then W (t) — B(t)—fg T(i) ds is aBrownian 
motion with respect to probability measure Q{A) = f A Z(r)dP , A e & . 

Proof First note that the probability measures Q and P are equivalent. Ito’s formula 
applied to Z(t) and W(t) Z(t) gives dZ(t) = 7(f) Z(t) dB(t) and d(W (t) Z(f)) = (1+ 
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Y(t) W(t)j Z(t ) dB(t), so that W(t ) Z(t) — J ( | (l + F(s) W(.?)) Z(s) dB(s) implying 
that W(t)Z(t), 0 < t < r, is a local P-martingale (Sect. 4.4.3). It can be shown that 
W(t)Z(t), 0 < t < r, is a P-martingale, so that W(t), 0 < t < r, is a (/-martingale 
(Exercise 5.16). 

The integral form of Ito’s formula applied to (W (tj 2 — f) Z(t ) gives 

(’ W(t ) 2 - tj Z(t ) = / (2 W(s) + (W(s) 2 - .s-)) Z (.y) dB(s), 

Jo 

so that (W(t) 2 — tj Z(t) is a P-martingale, which implies that W(t j 2 — t is a g- 
martingale (Exercise 5.16). Since W (t) 2 = (W (t) 2 — tj + t and W {t) 2 — t is a 
(9-martingale, we have ( W) (t) = t a.s., so that W(t) is a Brownian motion with 
respect to Q (Example 5.8). A 


Example 5.32 LetP(t) be a Brownian motion on a probability space (Q , ■ C Z . P). The 
stochastic process W (t) = B(t ) — J ( J h(s) ds, h e L 2 [0, r] , is a Brownian motion 
with respect to the probability measure Q defined by d(Q = exp[ / Q T h(s) dB(s) — 
(1/2) fj h(s) 2 c/i] dP. <> 

Proof We have E[ h(s) 2 </.?] = fjh(s) 2 ds < oo by assumption, so that h e 
J4? [0, r] . Since Z{t) in (5.73) satisfies the conditions of Girsanov’s theorem, W (t) = 
B{t) — Jq his) ds is a Brownian motion with respect to Q. A 

Example 5.33 Let X be a diffusion process defined by dX(t) = adt + b dB(t), t e 
[0, r] , with XiO) = x, where a and h f 0 are constants and Bit) is a Brownian 
motion on a probability space (Q . ■ < F , P). Then W(t) — B(t) + a t/h is a Brownian 
motion with respect to the probability measure Q defined by dQ — Z(r) dP. where 
Z(t) = exp[ - ( a/b)B(t ) -a 2 t/(2b 2 )\. The probability of (X(r) < is <*>((£ - 
x — a t)/(b \Jt)j, and can be calculated under both P and Q. <> 

Proof Under P, B(t) is a Brownian motion by assumption so that X(t) = x + at + 
bB(t) ~ N(x + at, b 2 t ) and P(X(t) <%) = &((£— x — a t)/(b \/r)). 

The Girsanov theorem with Y(t) = —a/b and 


Z(t) = exp 


Y(s)ds-( 1/2) / Y(s) 2 ds 


exp [ — (a/b) B(t ) — a 2 t /( 2 h 2 )] 


implies that W(t) = B(t) — Y(s)ds = B(t ) + at/b is a Brownian motion under 
Q, so that X(t) = x + at + bB(t) = x + b W (t) ~ N(x, b 2 t) under Q. Since 
dP/dQt = \/Z(t) — exp[(a/h) B(t) + a 2 t/(2 b 2 )j and at + bB(t) = bW(t), we 
have 


P(X(t) <f) = E P [\(X(t) <t-)]=E Q \Hx + b W(t) < £) — 



a fl2? \l 

1 {x + b W (0 < $) exp | 

Ct m + ^)\ 


a a 2 t \1 

1 (x + b W (t) < ^ ) exp | 

K b mt) -^)\ 
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This shows that P(X(t ) < £) can be calculated from 



J — oo 



1 r(£-x)lb 


exp[ — (a — a t/b ) 2 /(2 r)] da 


&((!; -x)-at)/(bV~t)) 


since Wit) ~ IV (0, f) under <2. ▲ 

Example 5.34 Suppose X is defined by dX(t ) = — aX(t ) dt + V2a dB(t), where 
X(0) = x, a > 0, B(r) is a Brownian motion on a probability space (Q , & , P). The 
probability pf(r) — / 3 (maxo< r < T X(t) > x cr ) can be estimated by 



where {x;(r)} are independent samples of X(t) and x cr is a specified threshold. If 
very few samples of X{t) exceed x cr , the estimate will be inefficient. It can be 
improved by measure change. For example, set Y(t ) = h(t) = y ~Jla in (5.73), 
where y is an arbitrary constant, so that Z(t) = exp[y -J2 a B(t) — y 2 a tj and 
dQ t = Z(t)dP, t e [0, r] . The process W(t) = B{t) — y ~j2oit is a Brownian 
motion in [0, r] under Q = Q z so that dX(t) = (— aX(t ) + 2 a y) dt + V2a dW(t) 
is a diffusion process under Q driven by the Brownian motion W. Accordingly, Pfix) 
can be estimated by 



where {x, (r)} are independent samples of X(t) under Q since 


1( max X(t) > x cr ) 

= E Q 

1( max X(t) > x cr ) (dP/dQ)( x) 

0<t<T 


0<f<T 


The estimates pf,Mc( r ) are zero if based on n = 500, 1,000, 5,000, and 10,000 
independent samples ofX(r), and 0.5 x 10 -4 if based on n = 100,000 independent 
samples of this process. Corresponding estimates /y with y — 4 are 0.2340 x 

10“ 4 , 0.3384 x 10“ 4 , 0.3618 x 10“ 4 , 0.3402 x 10“ 4 , and 0.3981 x 10“ 4 for 
n — 500, 1,000, 5,000, 10,000, and 100,000 samples, respectively. These numerical 
results are for a = 1, x cr = 2.5, and r = 1. <> 
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5.6 Exercises 

Exercise 5.1. Develop a recurrence formula for calculating the stochastic integral 
Jq C(s—) n dC(s), where C denotes a compound Poisson process and n > 1 is an 
integer. 

Exercise 5.2. Show that the summation in (5.7) is a semimartingale. 

Exercise 5.3. Let X(t) = (Xi(r) = cos(B(t)), X 2 (t) = sin(B(t))) be an Revalued 
stochastic process depending on a Brownian motion B(t). Show that X(t) is a diffu- 
sion process and find the stochastic differential equation defining this process. 

Exercise 5.4. Let X(t), t > 0, be the Ornstein-Uhlenbeck process in Example 5.10. 
Show that Y(t) = exp(X(f)) is a diffusion process by constructing the stochastic 
differential equation that defines this process. 

Exercise 5.5. Find the relationship between the Ito and the Stratonovich integrals 
[‘ C{s-)dC{s) and / Q r C(s-) o dC(s). 

Exercise 5.6. Construct the Ito and the Stratonovich differential equations for A'(r) = 
exp (B(t)), t > 0. where B(t) is a Brownian motion. 

Exercise 5.7. Let /i : R 1 be a differentiable function and consider the Ito 
stochastic integral equation X(t) — X(0) + Jq h(X(s)) dB(s) + (1/2) JqIi(X(s)) 
h'(X(s)) ds. Find the solution of this equation. 

Exercise 5.8. Show that the integration by parts formula Jq h(s)dB(s) — h(t)B(t) 
— Jy h'(s) B(s ) ds holds for h e C 1 [0, oo). 

Exercise 5.9. Calculate the first four moments ofX(t) in Example 5.24 for increasing 
values of X and a fixed a > 0 under the constraint 7, E [ K p ] = 2a. Comment on the 
dependence of the skewness and kurtosis of X(t) on X. 

Exercise 5.10. Show that the expectation of the local time L(t) in (5.46) exists and 
is E [L(f)] = y/2 1 j it for all t > 0. Estimate E [ L(t )] from samples of \B(t)\. 

Exercise 5.11. Write a Monte Carlo algorithm for estimating the Prandtl stress func- 
tion in Example 5.27. 

Exercise 5.12. Let U ( x ) = (1 /ri) X/Li Z/ + (1 /«) 2/Li -6 be an estimator of u(x ) 
in (5.57) corresponding to n independent samples of X, where Z, and /,- denote 

cT 

independent copies of ^(X(T)) and J 0 p(X(n)) da. respectively. Show that the 
estimator U(x) is unbiased and weakly consistent, that is, E |Y/ (x)j = u(x) and 
P(\U(x) — u(x)\ > e) — > 0 as n oo for arbitrary e > 0. 

Exercise 5.13. Let R e L l (P) be non-negative and P and Q be probability measures 
on a measurable space (£2 , .9?) such that dQ = RdP. For X e L 1 ( (Q) and a sub- 
cr -field ^ of show that Eq [X \ SfJ = E P [X R \ Sf] /E P [R | Z/\ holds P-almost 
surely. 
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Hint Since Ep \ \X R \ ) = f n \X\RdP — |X| dQ — Eq [|X|] < oo by assumption, 

Ep[X R | (f] is defined. Properties of conditional expectation give 


E P [XR 


dP= X RdP = / XdQ= / Eq [X 


dQ 


and 


( Eg[X\&]dQ = [ E q [X | Sf] RdP = [ E p {Eq [X\<g]R\&} dP 
JG JG JG 

= I Eg[X I Sf ] Ep [R I &] dP 
JG 

for any G e Z2 . These observations imply the stated relationship. 

Exercise 5.14. Show that exp (B(t) — 1/2) is an -martingale, where & t — a(B(s) : 

0 < s < t) denotes the filtration generated by a Brownian motion Bit). 

Exercise 5.15. Apply Ito’s formula to Z(t), W{t ), and ( W it) 2 — tj Z(t) to confirm 
the expressions of the differential and integral form of these processes used in the 
proof of Theorem 5.22. 

Exercise 5.16. Let (02, & , P) be a probability space and (J^r)?>o a filtration gener- 
ated by a Brownian motion. For Y and Z as in Theorem 5.20, show that an .^-adapted 
process Y(t), 0 < t < r, is a ()- mart in gale if and only if Y ( t ) Z(t) is a P-martingale, 
where Q(A ) = f A Z(r)dP,A e & . 

Hint Suppose Y(t)Z(t) is a P-mart ingale. Use Exercise 5.13 with (Yit). Z(r), ■ < F S ) in 
place of (X, R, C S) for s < t, the fact that Zit) is a P-martingale (Theorem 5.20), and 
properties of conditional expectation. Show that Ep[Y(t)Z( r) | ,)F S ] = Y(s)Z(s), 
which gives Eq [ Y(t ) | .1P V ] = Y (s) . Similar arguments can be used to show that, if 
Y(t ) is a 0-martingale, then Y(t)Z(t) is a P-martingale. 

Exercise 5.17. Construct a Monte Carlo algorithm for calculating the probability 
Pf( t) = P(maxo< f < T Xit) > x cr ) based on Girsanov’s theorem, where X is a diffu- 
sion process defined by dX(t) — ( a\X(t ) + ci 2 X(t) 3 ) dt + bdB(t), t e [0, r] , with 
X(0) = 0. Test the algorithm against direct Monte Carlo simulation for constants 
a i , ci 2 , b, and x cr of your choice. 
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Chapter 6 

Probabilistic Models 


6.1 Introduction 

Probabilistic models are used extensively in applications to characterize properties 
of physical systems and describe features of the input to these systems. Generally, in 
applications, model construction involves two phases, referred to as model selection 
and model calibration. 

Physics and any other available information can be used to select the functional 
form of the probability law of a random element up to a vector 0 of unknown or 
uncertain parameters that remain unspecified. For example. Beta translation fields 
may be used to model the spatial variation for a material conductivity since these 
fields take values in bounded intervals. However, the range and the shape parameters 
of marginal distributions for these fields as well as the parameters of their correlation 
functions may not be known. 

Data and other information can be used to calibrate a proposed model with func- 
tional form specified up to a vector 0 of uncertain parameters. The available infor- 
mation can also be employed to select the optimal member of a collection of model 
candidates and calibrate this model [8]. Our discussion is limited to the calibration 
of a single proposed model. Frequentist and Bayesian methods are usually employed 
to characterize 0 . The frequentist approach assumes that 6 is a fixed but unknown 
deterministic vector and uses data to construct estimates 0 for and confidence sets 
on 6 . The construction of confidence sets is usually complex when the dimension 
of 6 is two or larger ([27], Chap. 4). In contrast, the Bayesian method views 0 as a 
random element and infers its law from postulated prior densities and data. 

Model construction must account for both heuristic and theoretical considerations. 
We give a list of items that are relevant for model construction. First, models can only 
provide approximate representations for physical phenomena, that may or may not 
be useful depending on the objective of the analysis. In short, all models are wrong 
but some can be useful. 

Second, models provide the means for data extrapolation, which is needed since 
available measurements are usually insufficient for solving practical problems. For 

M. Grigoriu, Stochastic Systems, Springer Series in Reliability Engineering, 201 

DOI: 10.1007/978-l-4471-2327-9_6, © Springer- Verlag London Limited 2012 


202 


6 Probabilistic Models 




Fig. 6.1 Histograms of */nY„ for Xi ~ EXP(A) with A. = 1 scaled to have mean 0 and variance 1 
(left panel) and X\ ~ N( 0, 1) ( right panel) and n = 20 


example, design of tall buildings for wind loads may require estimates of extreme 
wind speeds over periods exceeding 1000 years, but the length of most wind speed 
records is about 30-50 years. It is common to assume that yearly wind speed maxima 
are independent samples of Weibull or other distributions depending on uncertain 
parameters that can be estimated from data, and use the resulting models to estimate 
extreme wind speeds over 1000 years or any other period. Models would not be 
needed if wind records longer that 1000 years would be available. 

Third, the performance of a particular model depends on its use. For example, 
let X[ , X 2 , be an iid sequence with finite variance. Suppose the mean and vari- 
ance of X 1 are known but the functional form of the distribution of this variable is 
unknown. The available information is sufficient for estimating the distribution of 
Y n = ( 1 / n) = i Xj provided n is not very small but is inadequate to describe 
Z„ = max 1 <,<„ \Xi}. Figure 6.1 shows histograms of *JnY„, n — 20, for X \ 
assumed to be exponentially distributed with decay parameter X = 1 , mean 0, and 
variance 1 (left panel) and X 1 ~ /V(0, 1) (right panel). The dotted lines in the figures 
are the density of N( 0, 1). Note that Y n is approximately Gaussian irrespective of 
the distribution postulated for X \ . Figure 6.2 shows the densities of Z„ , n — 20, for 
X 1 as in Fig. 6.1. The density of Z„ depends strongly on the probability law of X \ . 
The available information is insufficient for estimating the probability of Z n . 

Fourth, most models are usually consistent with the physics of the problem under 
consideration. Physically inconsistent models have been and should be used if they 
provide satisfactory solutions provided they are computationally much more efficient 
than physically consistent models. For example, Gaussian models can be used to 
describe daily flows in the Nile river since they have a large mean and a small 
variance. However, Gaussian models are inadequate for small rivers characterized 
by small average daily flow and large flow variability since they may predict negative 
daily flows with large probabilities. 

Fifth, not all uncertain parameters of a postulated model can be inferred directly 
from data. Properties of postulated models need to be used in this case to estimate 
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Fig. 6.2 Densities of Z„ for 
Xi ~ EXP(a) with k = 1 
scaled to have mean 0 and 
variance 1 ( dotted line) and 
X\ ~ N(0, 1) ( solid line ) 
for n = 20 
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model parameters. The estimation may require additional assumptions that may fur- 
ther reduce model accuracy. 

The following sections discuss and illustrate the construction of models for 
random variables and functions. Physics and data are used to select the functional 
form of various models and specify their uncertain parameters. Frequentist and 
Bayesian methods are applied for model calibration. 


6.2 Random Variables 

Let X be an R' 7 -valued random variable defined on a probability space (X2 , .'X . P ). 
Our objective is to construct models for the probability law of X from indepen- 
dent samples of this random variable and any other information when available. 
Our construction uses Frequentist and Bayesian methods, and considers Gaussian 
and non-Gaussian vectors. The methods in this section are applied to construct a 
probabilistic model for generating directional wind speeds for hurricanes. 


6.2.1 Gaussian Variables 

Let X — (X i , . . . , Xd) be an Revalued Gaussian random variable with mean vector 
/x = {/Xi = E[Xk], k = 1, . . . , </}, covariance matrix y = { y m = E[(X * — /x*) 
( Xj — iii)], k, l = 1, . . . , d}, and density 
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where det(y) denotes the determinant of y. 

The size of the vector 6 of uncertain parameters is dg = d + d(d + l)/2 if all 
entries of /x and y are uncertain, and dg < d + did + l)/2 otherwise. Suppose 
that /x and y or some of their entries are uncertain and that n independent samples 
(x (1 \ . . . , of X are available. 


6.2.1.1 Frequentist Method 

The method of moments and the method of likelihood function are commonly applied 
to construct point estimates 6 fort? and confidence sets ([27], Chaps. 3, 4, and 6). The 
method of moments calculates point estimates 6 for 6 from equations obtained by 
equating dg moments of X that depend on 6 to estimates of these moments obtained 
from data. For example, suppose X is a Gaussian random variable (d = 1) with 
probability law depending on the uncertain parameters 0 = (/x, y). In this case, 

/x = - V x (,) and y = ^ (x (,) — jl) 2 , (6.2) 

n n — 1 

1=1 i=i 

and probabilities of events related to X can be calculated approximately from the law 
of X with 0 — (jl, y) in place of 0. Accordingly, the probability of [X > a}, a e R, 
is approximated by P(X > a) ~0(-(a-£)/y 1/2 ). 

In the maximum likelihood method, 0 is such that it maximizes the likelihood 
function 


n 

U6 | data) = Ylf(x (i) ; 9), (6.3) 

i=i 

viewed as function of 0, where (x <r> , . . . , x (,,) ) is the available sample and /(•; 0) 
denotes the density of X. We refer to 0 as the maximum likelihood estimate (MLE) 
of 6. For example, suppose X is a real- valued Gaussian variable with unknown mean 
/x and variance y. The likelihood function in (6.3) is 


l {9 | data) = £(/x, y \ data) = (27ry) " / 2 exp 


1 " (xW-jrf- 

2 h y J 


(6.4) 


It is convenient to find the MLE (jl, y') by maximizing the logarithm In (1(6 \ data)) 
of the likelihood function. The resulting MLE estimates are /x = (1 / n ) ^" =1 x (,) and 

y = (i/")Z; U (* (<) - A) 2 . 

Confidence sets can be used to quantify the uncertainty in the unknown para- 
meters of a model. For example, suppose X is a real-valued Gaussian variable with 
unknown mean /x and known variance y. Let U = (1/m) X/=i an estimator 

for /x corresponding to n independent samples of X, where X 1 ' 1 are independent 
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copies of X. Since U is a Gaussian variable with mean /x and variance y/n, 
y/njy ( U — /x) ~ N(0, 1) so that P[—a p / 2 < y/n / y (U — /x) < ap/ 2 ) = 
/? for p e (0, 1) and 0(a p /2) = (1 — p)/2 or, equivalently, P(U — a p / 2 y/y/n < 
pi < U + cip /2 y/y/n) = p ■ The interval I(p) = (L\ = U — a p / 2 y/y/n, L 2 = 
U + a p /2 y/y/n) is referred to as a (symmetric) confidence set for the mean pi of 
X. Note that, on average, a percentage of 100 p samples (/ 1 , 12 ) of I (p) = ( L \ , 
L 2 ) corresponding to sets of n independent samples of X include /x. The confi- 
dence interval (L \ , Li) on /x can be used to construct a similar confidence set on 
P(X > a), a e K, since this probability is an increasing function of pi for a fixed 

а. The p-confidence interval on P(X > a) is (Lp \ = {<P(L\ — a)/o), Lp 2 = 
(0(L 2 -a)/cr)). 

Similar considerations can be applied in the more general case in which both 
parameters (/x, y ) of the distribution of X are unknown. However, the construction 
of confidence sets is less simple ([27], Theorem 4.2.5, Sect. 4.3). Moreover, it is 
difficult to map these confidence sets on (/x, y ) into confidence sets on the probability 
of events depending on X. 

б. 2.1.2 Bayesian Method 

Let X be a Gaussian variable with unknown mean /i and variance y = 1/ h . The 
uncertain parameters 6 = (/x, h) of the law of X are viewed as random vari- 
ables, so that the density of the conditional random variable X \ 6 is f(x) = 
{h/(2n)) ' ,/2 exp (— h(x— /x) 2 /2). Suppose that, as previously,?; independent samples 
(:t (1 \ . . . , x { " } ) of X are available and that, in addition, there is preliminary infor- 
mation on 6 that can be quantified by a prior density f (9). For X ~ /V ( /x , 1 / h), it 
is common to describe the prior information on 9 = (pi, h) by the normal-gamma 
density 



(6.5) 


with parameters (/x', n', t;' , v '), pi' e K, n ' , f', v' > 0, that is, /x | /? is a Gaussian 
variable with mean p.' and variance 1 /(Ini') and h is a Gamma2 variable with para- 
meters (£' , v'). The posterior density f"(9) of 9 can be calculated from 


f"(9) oc f (9)1(9 | data), 


( 6 . 6 ) 


where l (9 \ data) denotes the likelihood function and the symbol oc means propor- 
tionality. For the prior density in (6.5), f" (9 ) is also a normal-gamma with parameters 
(/x", n" , t," , v") given by 
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n" — n + n , v" = n + v' , p” = (nx + n' p!)/n" 

ui n >.r r i !r 2 i / 2 i -2 /// 2 (6-7) 

£ v = £ y + n (p ) + (n — l)s + nx — n (p ) , 

where x = £" =1 x (, ' } and ,? 2 = Xf=|(x (i) - i) 2 /(n - 1) ([31], p. 55 and Chap. 7). 
Prior density functions with the property that their posterior densities are of the same 
type are called conjugate priors. If there is no prior information on 0 = (p. h), 
we can use the so-called noninformative or vague prior density f'(p,h) oc l//i. 
Under this prior information, the frequenties and the Bayesian methods use the same 
information. 

The posterior density /" (0) accounts for all available information, that is, data and 
prior knowledge on 0 , and can be used to construct point estimates of and confidence 
sets on 0. For example, the expectation of 0 with respect to the probability measure 
f”(9) can be used as an estimate for 0. The probability of event {X e D — (a, b)}, 
a < b, can be calculated from 


P(X e D) = 


0 {(b-ix)h 1/2 ) - 0 ((a-/j,)h l/2 ) 


f"(fi,h)dfidh (6.8) 


since X \ 6 is a Gaussian variable. Note that the uncertainty in the parameters of the 
law of X can be incorporated simply in the Bayesian analysis by following the pro- 
cedure in (6.8), in contrast to similar calculations within the Frequentist framework 
that pose notable difficulties. An extensive discussion on the Bayesian method can 
be found in [31] (Chap. 3) and [41] (Chaps. II, III, and VIII). 


6.2.2 Translation Variables 

Let G — (G i, . . . , Gd) be an IR^-valued Gaussian variable with means E[Gk] — 0, 
variances E[Gf] = 1, k = 1 , ,d, and covariances p = {p^j = E[GkG{\, k,I = 
1, . . . , d}. Define an valued non-Gaussian variable X = (X\, . . . , X,i ) , referred 
to as translation vector, by 

X k = F~ l o0(G k ), k=l,...,d, (6.9) 

where F k , i = 1, . . . , d, are arbitrary continuous distributions (Sect. 3.7.2). 

Consider an -valued non-Gaussian translation variable X. Suppose that (1) the 
functional forms of its marginal distributions {Fk} are known but all or some of their 
parameters are uncertain and collect them in a vector 6 {k) and (2) the information on 
X consists of n independent samples (x^\ . . . , x <H) ) and possibly prior information 
on the uncertain parameters in the probability law of X. 

If the marginal distributions { Fk ] ofX are known, the observations (x (l) , . . . , x (n> ) 
can be mapped into their Gaussian images (g (1 \ . . . , g ( -"- ) ) by the transformations 
= 0~ l o F k (x^) for k = \ ... ,d and i = 1 Estimation procedures 
presented in a previous section for Gaussian variable can be applied to estimate 
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the covariance matrix p of the Gaussian image G of X. The estimate of p and the 
distributions { } define the probability law of X. 

We consider the case in which the functional forms of the distributions {Fp} 
are known but their parameters and the covariance matrix p are unknown. A 
two-dimensional translation vector X with coordinates X \ ~ /V(/z, y = a 2 ) and 
X 2 ~ EX P(k) depending on the unknown parameters /z, cr, and k is used for illus- 
tration. Note that it is not possible to calculate the Gaussian image (g®, . . . , g l,l> ) 
of (x® , . . . , x®) in this case since the mappings x\ l) h-> gj !) = (x{ () — /z ) / a and 
1 — > gT * = 0~ l (l — exp(— kx% "*)), i = 1, ... ,n, are uncertain. 


6.2.2.1 Frequentist Method 


The methods of moments can be used to calculate point estimates {0® } for the uncer- 
tain parameters of {0®} from the available data (x® , . . . , x®) and construct point 
estimates p for the covariance matrix p of G from the Gaussian image (g®, . . . . g®) 
of (x®, . . . , x®) obtained via the mapping X \ ->■ G with {0®} in place of {0®}. 
For example, the approximate mappings for the two-dimensional translation vector X 
with coordinates Ai ~ A(/z, cr 2 ) and X 2 ~ EXP(k) arexj !) 1 -^ g® ~ (x} !> — /z)/cf 
and x^ i->- g^ 1 — 0~ l (l — expi—kx^)), where (/z, a , k) are point estimates for 
(/z, cr, A.) obtained by the method of moments. A point estimate p for p can be 
obtained by the method of moments from the Gaussian images g® = (gj M , g, *) of 
x® , i = 1, . . . , n. The method is simple but its accuracy depends essentially on the 
sample size n. If n is sufficiently large such that the uncertainty in the point estimates 
/z, a, k, and p is small, then the method can be adequate. 

The method of the likelihood function can be used to construct simultaneous 
point estimates for both the uncertain parameters {0®} and p. The MLEs for these 
parameters maximize the likelihood function 


£(0®, ...,0®,p I data) 

ex det (p )-"/ 2 f[ J] exp ( - ^(g ( 0 )V 1 g (i) ), 

i=i*=l Mk ) V 2 ' 


(6.10) 


where g (,) = (gj 0 , . . . , g^°) andg{° = 0 l oF k (xj c ,) ), k = 1, . . . , d, i = 1, . . . , n. 
Analytical expressions are possible for these estimates only in special cases. Gener- 
ally, optimization algorithms need to be used to find MLEs for {0®} and p. 

For example, the joint density of a bivariate translation vector X with coordinates 
X 1 ~ A(/z, cr 2 ) and X 2 ~ EXP(k) is 


fix i,x 2 ) 


k 

— > = ex P 

a - p 2 ) 


— kX2 + 


*2 


g?-2/Qgig 2 + gf 

2(1 - p 2 ) 


( 6 . 11 ) 


where gi = (x\ — /z)/cr and g 2 = 0 *(l — e A ' 2 ). Figure 6.3 shows the density 
fix 1 , x 2 ) for p. — 1 . cr = 1, k = 2, and two values of the correlation coefficient p of 
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Fig. 6.3 Joint density of X for p = 1, a = 1, X. = 2, p = —0.7 ( left panel), and p = 0.7 ( right 
panel) 


its Gaussian image, p = —0.7 (left panel) and p — 0.7 (right panel). The likelihood 
function corresponding to n independent samples (x (l) , . . . , of X is 



whereg[ ,) = (x{ ;) — /x)/er, g ^ (l — exp(— ■*)), p € (— oo, oo), a, X > 0, 

and p e [—1, 1]. The maximum likelihood estimate of (/r, o, a, p) has to be obtained 
by optimization. Classical algorithms may fail since they could get trapped into 
local maxima. Genetic or any other global optimization algorithms usually provide 
accurate and reliable solutions. 


6.2.2.2 Bayesian Method 

Suppose as previously that Xis an R^-valued translation vector defined by (6.9) whose 
probability law depends on the uncertain parameters {6^ k> }, k = 1 , ,d, and p . The 

likelihood function corresponding to n independent samples (x <] \ . . . , x ( "- ) of X is 
given by (6.10). The posterior density of the uncertain parameters of the probability 
law of X is given by 

f'(o a) , e (d \ p) oc f\e w , e (d \ p)t(o m , . . . , e (d) , p \ x a) , x (n) ). 


(6.13) 


where . . . , 6^, p) denotes the prior density of these parameters. 

The posterior density in (6. 1 3) can be used to find point estimates of and confidence 
sets on the uncertain parameters (0^, . . . , 9^ d \ p) and calculate, for example, the 
expectation TstKX e D)], D C with respect to /"(0 (1) , . . . , 0^ , p). 
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Let X be a bivariate translation vector with coordinates X\ ~ /V(/i, a 2 ) and 
A - 2 ~ EXPO.). Assume /x = 0, so that (cr, A., p) are the uncertain parameters in the 
probability law of X. Suppose the data used to calculate the likelihood function in 
(6.12) is available and that the prior information on the uncertain parameters can be 
quantified by 

/'(cr, A, p) oc 1(<7 e (<T 1 , cr 2 )) 1 (A e (Ai, A 2 ))l(p e (pi, P 2 », (6.14) 

where (cr 1 , 02 ) = (0.5, 1.5), (Ai,A 2 ) = (1,3), and (pi,P 2 ) = (0.1, 0.9). The 
top two panels of Fig. 6.4 show two data sets each consisting of five independent 
samples of X with (cr = 1, A = 2, p = 0.7). The bottom panels in the figure 
show posterior density functions of X corresponding to these data sets and the 
prior density /'(cr, A, p) in (6.14). Since /'(cr, A, p) is constant over its support 
(cr 1 , <72 ) x (Ai, A 2 ) x (pi , P 2 ), the shapes of the posterior densities of (cr, A, p) and 
X are determined by data. The significant differences between the densities of X in 
the left and the right panels of Fig. 6.4 suggest that unsatisfactory approximations 
may result for the posterior density of X under noninformative prior if the sample 
size is small. 
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6.2.3 Bounded Variables 


Let X be an Revalued random variable taking values in a bounded rectangle 
D — x | [a i , bi\ C Suppose the functional form of the distribution F of X 
including the parameters specifying the shape of this distributions are known. The 
only unknown parameters of F are those specifying its support. For example, we 
may take X = a; + (bj — cii)Yj, i — 1, . . . , d, where Y — (Tj, . . . , Yj) is an 
-valued random variable with support [0, Y\ d and known probability law. FIow- 
ever, the parameters {« ,• , bj } giving the range of X are not known. Let X with coordi- 
nates Xj = a, + ( bi — Ui ) Yi , i = 1, . . . , d, where {a,, bi } are estimates of {a/, bi}. 
Generally, the support D = x j [d; , b,] of X differs from D. 

Example 6.1 LetZbe a Beta random variable with range [a , //] and shape parameters 
( p , q), p, q > 1, so that Y defined by X — a + (b — a)Y is a standard Beta variable 
with range [0, 1], density f(y) = y p ^'(l — y) q ~ l /B(p, q), y e [0, 1], and shape 
parameters (p.q). The distribution of X is 

F(x) = P(X <x) = P (y < ±221) = yP-\\-y) c ‘- { dy = I (V.p.q), 

\ b-aj B(p,q)J 0 


where ^ = (x — a) jib — a) and /(£; p, q) denotes the incomplete Beta function 
ratio. 

Suppose ( p. q) are known but D = [ a , b\ is unknown, and let D = [a, /;] be an 
approximation for D. The distribution of X = a + (b — a)Y is 


F{x) = P(X < x) = P 



i(k\ p,q). 


where ^ = (x — a) jib — a). The difference between Fix) and Fix) results by direct 
calculations and is 


I Fix) - F(x)| = 


1 

B(p, q) 



yP-\\-y) q - x dy. 


Figure 6.5 shows errors | — F(x)| in the range [2,4] for p = 3, q = 2, 
a = 2, b = 4, {a,b} = {1.8, 4.2} (heavy solid line), {2.8, 3.8} (dashed line), 
{1.8, 4.0} (dotted line), and {2.0, 4.2} (thin solid line). The error depends strongly on 
differences between D and D . Moreover, the image G — 0 ~ 1 o F(X) of X based on 
the approximate range of this random variable is not Gaussian since P(0~ x oFiX) < 
x) = P(X < F~ l o 0(x)) = F(F~ l o 0(x)) ^ 0(x). O 
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Fig. 6.5 | F(x) — F(x)| for 
p = 3, q = 2, a = 2, b = 4, 
{«,£} = {1.8, 4.2} ( heavy 
solid line ), {2.8, 3.8} 

( dashed line), {1.8, 4.0} 
(dotted line), and {2.0, 4.2} 
(thin solid line) 



X 


6.2.4 Directional Wind Speed for Hurricanes 

Hurricanes occur at random times on the United States Gulf and East coasts, and are 
characterized by high wind speeds with random directions. Model construction for 
hurricane is largely based on data since hurricane physics is rather complex. 

It has been proposed in [14] to model (1) the arrival times T\ < 73 < • • ■ of 
hurricanes at a site during the hurricane season by a homogeneous Poisson process 
N(t) with intensity X > 0 and (2) the wind speed during distinct hurricanes by iid 
R^-valued random variables giving wind speeds in d > 1 directions. Let F(-; 6) 
denote the distribution of the wind speed vector, where 0 collects the uncertain para- 
meters in the expression of this distribution. The Poisson process N(t) for hurricane 
arrivals and the distribution /-’(■; 0) specify the functional form of the probability law 
of the hurricane process. Wind speeds recorded during hurricanes and the number of 
hurricanes over a reference period can be used to calibrate this process. 

Let V be an Revalued random variable denoting wind speeds during a hurricane. 
The coordinates { V* } of V are wind speeds in d equally spaced directions. It is 
assumed that V is a translation vector with coordinates 

Vk = Ff l (0(G k )), k d, (6.15) 

where { Z 7 * } denote wind speed distributions and \Gk), k — l, ... ,d, are correlated 
standard Gaussian variables with covariance matrix p = \pk,l = E[GkG/]}, k, l = 
1 , ... ,d. Under this model, the functional form of the law of the hurricane process 
is completely specified by the intensity X of the Poisson process N(t), the marginal 
distributions {/')•} of V, and the covariance matrix p of G = { GR }. 

Suppose extreme wind speeds have been recorded in d directions during n distinct 
events occurring over a period of n y years. This data set is used to calibrate the 
hurricane process. The average number X of hurricane per year can be estimated by 
the ratio X = n/n y . This estimate is accurate if n is relatively large. 
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The calibration of the probability law of V involves two steps. First, the parameters 
of the distributions {F/} of the directional wind speeds { 14 } need to be estimated. 
Second, the covariance matrix p of the Gaussian image G of V has to be identified. 
We proceed with the first step. Since some directional wind speeds are zero, the 
marginal distributions {Fk} are modeled by 

F k {x) = qkl(x > 0) + (1 - qk)F k (x), k=l,...,d, (6.16) 


where qk — P(V t =0) and F^ denotes the distribution of non-zero values of V).. The 
functional forms of the distributions { Z 7 * } are assumed to be known. The probability 
qk can be approximated by the ratio ijk — nk/n for sufficiently large iik and n, where 
rik denotes the number of observed zero wind speeds in direction k = 1 , ,d. The 
parameters of the distributions {F/} can be estimated from the non-zero readings 
of directional wind speeds { V/.}- For example, suppose { /•/ | are reverse Weibull 
distributions with parameters (op-, qk, Ck), a common assumption in wind studies 
[37]. Let The a Weibull random variable with parameters a > 0, ( e I, and c > 0, 
and distribution 


F(y) = 


1-exp^-^^], y>l 

0 )<(, 


(6.17) 


so that the reverse Weibull variable X = — Y has the distribution 
F(x) = P(X <x) = P(Y > -x) = exp ^ — - 

Moments of any order of Y can be obtained from moments /-’ [ ] = /’( 1 + q/c) of 

the scaled random variable Y dehned by Y — f + aY ([24], Chap. 20). For example, 
the mean p, v , variance a^, and skewness y y ^ of Y are 


x <q = (6.18) 


fiy = % + aC( 1 + 1/c) 

cr“ = a 2 (r(l + 2/c) - C( 1 + 1/c) 2 ) 

n 1 + 3/c) - 3G(1 + l/c)r(l + 2/c) + 2C(1 + 1/c) 3 

Yy, 3 = 7 273/2 • ( 6 - 19 ) 

(r(i + 2/c) - ra + i/c) 2 ) 7 

Let Vk, i, Vk,2, . ■ . , Vk,n-n k be non-zero wind speed data in a direction k = 1 , ,d 
recorded at a site. The method of moments, the method of maximum likelihood, 
the method of probability-weighted moments, and other methods can be used to 
estimate the parameters of this distribution ([26], Chap. 22). Extensive numerical 
studies in [28] suggest that the method of moments delivers satisfactory estimates 
for the parameters of Fk and is superior to, for example, the maximum likelihood 
method. 

The covariance matrix p can be estimated from the Gaussian image of the avail- 
able directional wind speed data defined by the mapping Gk = 1 ( Fk ( V/)) , 
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Fig. 6.6 Data and model-based wind speeds in Miami for MRIs up to 1,800 years (left panel) and 
18,000 years (right panel ) 


k = 1, . . . , d, in (6.15). This mapping cannot be used directly since the distributions 
{ h) : (x)} have discontinuities at x — 0. To overcome this difficulty, all zero readings 
in the data set have been changed into independent samples of U (0, e), where e > 0 
is a parameter much smaller than all non- zero wind speed readings. The distribution 
of the corresponding wind speed vector V* becomes 


F k M = qk 


— 1 (0 < x < s) + 1 (x > e) 
s 


+ (1 -qk)F k (x), 


k d, (6.20) 


so that its Gaussian image is G\ — <t> 1 o F?( V*). This relationship maps wind 
data into Gaussian data, and is used to estimate the covariance matrix p* of G* = 
(G*, . . . , G* d ) [14]. 

The resulting model for the hurricane process can be used to generate direc- 
tional wind speed sequences with independent and dependent coordinates following 
the same marginal distributions. The dependence between directional wind speeds 
corresponds to that between the coordinates of the Gaussian vector G*. If the coor- 
dinates of G* are assumed to be independent, so are the corresponding directional 
wind speeds. 

Let /* max denote the distribution of the largest wind speed V max irrespective of 
direction during a hurricane. The wind speed Vmax exceeds vr on average every R 
years, where vr is the solution of R — 1 /[/.( I — f max ( v /?))]■ The average time R is 
referred to as mean recurrence interval (MRI) for wind speed vr. Figure 6.6 shows 
wind speeds vr in Miami for MRIs up to 1,800 years (left panel) and 18,000 years 
(right panel). The heavy solid and dotted lines are predictions of the hurricane models 
with dependent and independent directional wind speeds described in this section. 
The models have been calibrated to data generated by a Monte Carlo algorithm based 
on physical models for hurricane wind flow ([36], Chap. 3). The thin solid lines 
are estimates of vr obtained directly from this data set. Note that wind speeds for 
large MRIs cannot be obtained directly from data, the assumption of independence 
between directional wind speeds is conservative, and the hurricane model can be 
used to generate wind speeds of arbitrary MRI. 
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6.3 Random Functions 

Let X (.y), .v e D, be an K 1 - valued random function on a probability space (Q , .'X , P), 
where D is a subset of . Let (x^(.y), . . . , x^(s)), s e D, be n independent 
samples of X. Suppose the functional form of the finite dimensional distributions 
of X has been selected up to a vector 6 of uncertain parameters. Our objective is to 
estimate 0 from data and any other available information. 

Estimates developed in the previous section for random variables can be extended 
to random functions. First, X(s), s e D, is replaced with a random vector X con- 
sisting of values of this random function at a finite number of arguments Sk e D. 
The functional form of the distribution of X results from the finite dimensional 
distributions of X(s). Second, developments in the previous section can be applied 
to characterize the uncertain parameters 9 in the distribution of X from samples of 
this vector that can be extracted from samples of X(s). 

The selection of the sampling points Sk e D used to define X needs to account for 
the temporal/spatial correlation of X which is not known. Closely spaced arguments 
{.s>} relative to the correlation distance of X(s) is inefficient and can cause numerical 
difficulties since it results in large vectors X that may have strongly dependent coor- 
dinates. On the other hand, widely spaced arguments {sk} may provide an inaccurate 
description of the frequency content of X(s). 

Example 6.2 Let X(t), t > 0. be a real-valued stationary Gaussian process with 
mean 0 and covariance function c(r) = E[X(t)X(t + r)] = exp(— k|r|). Suppose 
that X > 0 is unknown and that n independent samples of X(t) have been recorded 
in a time interval [0, t] at times 0 = to < t\ < • • • < t m = i. The recorded 
values of X(t) constitute n independent samples of an R" !+l -valued random variable 
X ~ N(0, y), where y = { c(t ,• — tj), i, j = 0,1,..., m}. The likelihood function 
of X corresponding to the samples of X is 





i=i 


where x ^ = (x,(to), X;(fi), . . . , x/(t m )) / and x,(t) denotes sample i = 1 ,...,« of 
X (?) in [0, t]. The maximum likelihood estimate X of X maximizes i(X \ data). The 
likelihood function £(X | data) and prior densities f'(X) of X can be used to calculate 
posterior densities f"(X). 

The heavy dash, heavy solid, and thin solid lines in Fig. 6.7 are likelihood functions 
l(X | data) scaled by their largest value for (t — 10, m = 100), ( t = 100, m = 100), 
and ( t = 200, m = 100), respectively, so that the corresponding sampling rates are 
At = 0.1, At = 1, and At = 2. The sample of X(t) used in the analysis is for 
X = 1 . The likelihood functions for At > 0.1 provide less and less information on X . 
On the other hand, the likelihood function for At = 0.1 is concentrated on the actual 
value of X suggesting that At <0.1 is an adequate sampling rate for this process in 
the sense that it is capable of identifying the uncertain parameter X. O 
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Fig. 6.7 Scaled likelihood 
functions l (A. | data) for 
(t = 10, m = 100) ( heavy 
dash line), 

( t = 100, m = 100) (heavy 
solid line), and 
(t = 200, m = 100) (thin 
solid line) based on a single 
sample of X (t) with A = 1 
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6.3.1 Systems with Uncertain Parameters 

Suppose the probability P(W > w ) provides a metric for evaluating the performance 
of a system with m components that have uncertain properties. Our objective is to 
estimate P(W > w) from the available information consisting of physics and data. 

The physics implies (1) W = X™=t c kXk, where q are known constants, 
X k = (1//) L X/ C (s)ds, l > 0, and {Xjt(s)} are independent copies of a real- 
valued random field X(s), s > 0, and (2) X(s) takes values in a bounded interval of 
(0, oo). An example of such a system is a statically determinate truss with m linear 
elastic components whose compliances X(s) vary randomly along them. 

The data set consists of n independent samples (x\, . . . , x n ) of AT.v) at a fixed 
argument 5 and n independent samples (x\, . . . ,x n ) of X. The two data sets are 
referred to as local and global compliances, respectively. 

The available information is insufficient for constructing models for A(.sj needed 
to calculate P(W > w). Several assumptions are made to characterize AT.v). First, 
assume that X ( s ) is an homogeneous random field so that n = E[X (.v)] and a~ = 
Var[A(i)] = ^[(A - ^) — /x) 2 ] are space-invariant, c(r) — ct 2 ^(r) = £[(^(5 + r) — 
/x)(Z(s) — fi)] depends only on lag r ,JZ= fttA 1 ] = /r,, and 



/ J-l 


Since |£(r)| < 1, we have a 2 < a 2 . Equality corresponds to the case of a 
perfectly correlated random field AT.v), that is, £(r) = 1. Figure 6.8 shows the 
available data set consisting of n = 30 measurements of X(s) at an arbitrary .v and 
n = 30 measurements of X for / = 20. The estimates of the mean and standard 
deviation of AXs) and X are ( jl = 0.08644, <7 = 0.00611) and (JZ = 0.08862, a = 
0.00363), respectively. The approximate equality pt ~ JZ is consistent with the 
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postulated stationarity of X(s) and the inequality W < a indicates that AT.v) is not 
perfectly correlated. 

Second, assume that the correlation of X(s) is exponential. We model X(s) as an 
exponentially correlated diffusion process, that is, 

X(s) = ijl + se[0, /] 

dY (s) = —XY(s) ds + b{Y(s)) dB(s), X > 0, (6.22) 

with diffusion coefficients 

b(y) = (2A) 1 / 2 (Gaussian distribution) 

b(y) — (1(3 — y 2 )) 1/2 , y e [— V3, V3], (Uniform distribution), (6.23) 

where B denotes a Brownian motion. Note that the models defined by (6.22) and 
(6.23) depend on the uncertain parameters (/x,er, X) and have the same second 
moment properties but different distributions. Also, the Gaussian model is physi- 
cally inconsistent since the compliance field is bounded and strictly positive. This 
completes the first phase of model construction, that is, the selection of functional 
forms for the probability law of X(s). 

Model calibration, that is, the estimation of (/r , ct, X), constitutes the second phase 
of model construction. We have found jl = 0.08644 ~ JL, a = 0.00611, and 
if = 0.00363 from the available data. Since £(r) = £ , [T(s , -|-t)T(^)] = exp(— A,|r|) 
for the postulated models, the function h(X) in (6.21) has the expression 

./2(X1 + exp(-Xl) - 1) 

h(X) = V 



Local compliance, {xf} 
Fig. 6.8 Data on X(i) and X 


XI 


(6.24) 
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Fig. 6.9 Failure probability 
P(W > w) under uniform 
and Gaussian models 



which, together with the estimates of a and a, gives X = 0.2205. The models forAY.v) 
are completely specified by replacing (/i. a, X) with their point estimates (fi, a , a), 
so that P(W > w ) can now be calculated. Alternative procedures can be used to 
quantify the uncertainty in (/x, a, X) by, for example, the posterior density of these 
parameters [17], 

For (/ x , er, X) set equal to (fi, o , X) the mean and variance of W are 

m / m m 

W = ^ c k X k ~ (fx w = fxYc k ,ol = cr 2 h( X) ^ cl 
k = 1 ' k=l k= 1 


sothatP(W > w) — <P[(/jL w — w)/(t w ) under the Gaussian model. Since P(W > w) 
is not available analytically under the uniform model, it has been obtained by Monte 
Carlo simulation. Figure 6.9 shows the probability P(W > w) under the models in 
(6.22) and (6.23). Although the Gaussian model is physically inconsistent, it provides 
a satisfactory approximation for P(W > w). 

6.3.2 Inclusions in Multi-Phase Materials 

The geometry and spatial distribution of inclusions can affect significantly global 
properties of multi-phase materials, for example, strength and permeability of 
concrete and various properties of other type of composite materials [5]. This 
section constructs and calibrates a probabilistic model capable of generating virtual 
inclusions with arbitrary geometry that are statistically consistent with a particular 
population. The presentation is based on developments in [18]. 
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6.3.2.1 Spherical Harmonics 

Consider an arbitrary inclusion occupying a bounded subset D of K 3 . Set the origin 
O of this space at the centroid of D, and let g(9, <p) denote the distance from O to the 
boundary 3 D of D, where (0, ip) e [0, 7r] x [0, 27r] are spherical coordinates. It is 
assumed that g is a smooth function and D is star-like, that is, every ray from the origin 
of R 3 intersects the boundary 3 1) of D at a single point [6, 29, 34]. The relationship 
between spherical and cartesian coordinates, ( 0 , ip, g(0, (p)) and ( x \ , xj, X3), is given 
by xi = g(0, <p) sin(0) cos(< p), X 2 = g(0 , <p) sin(0) sin(^)), andx3 = g(0, <p) cos (0). 
If g(9, (p) = 1 for all 0 e [0, tt] and (p e [0, 2 jt], then D is the unit sphere in K 3 . 

We use spherical harmonics to represent g(9, (p) since these functions provide 
efficient representations for inclusions with arbitrary shapes [11, 22, 34]. The repre- 
sentations of g(0, (p) are truncated Fourier series in spherical harmonics of the type 

n n k 

gn(0, 1 p) = 5>(0. <p) = zz dk,hUk,h(9, <p), n = 0,1,..., (6.25) 

k=0 k= 0 h=-k 


on [0, tt ] x [0, 27 t], where 


yk(0, ip) = ak,oPk (cos(0)) 
k 

+ ^ {ak,h cos(hip) + bk,h sin (hip)) /?a%/j(cos(0)), k = 0, 1, ... , (6.26) 

h = 1 


1/2 


are spherical harmonic functions, 

ak ' 0 = d k- 0 (^r) 

'(2k + l)(k — h)\ \ 1/2 


ak,h = ( 


-D" (- 


4 7t(k + h)\ 


[ ) 


291 [d klh ] 




h ( (2k + l)(k - h)\ \ l/2 
V 4n(k + h)\ J 


2S[4,/j], 


/ 7T P7T 

/ g(0, <p)ul h (6, <p) sin(0) d0d<p, 

-71 JO 


(6.27) 

(6.28) 


uk,h(6, <P) 


/ (2k + l)(k — /z ) ! \ 1/2 
V 47T(k + h)\ ) 

Pk,h(x) = (1 — x 2 ) h/2 ~r~p 


Pk,h(cos(0))e lhlf> 

Pk(x), 


(6.29) 

(6.30) 


denote associated Legendre polynomials, and 
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Pk(x) 


I d k 
2 k k\ dx k 


(x 2 


-D k 


(6.31) 


are Legendre polynomials. If g has second order continuous partial derivatives, the 
sequence of function g„ in (6.25) converges absolutely and uniformly to g in [0, n] x 
[0, 27r] as n —> oc. If g is square integrable on the unit sphere, the convergence 
g„ g as n —> oo is in the mean square sense ([32], Theorem 9, p. 726). 

The first term flo, 07*0(208(0)) = ao.O of the series g n (9, q> ) in (6.25) gives the 
radius of a sphere providing a first order approximation for an inclusion. The subse- 
quent terms in the expression of g n (0, (p) capture differences between inclusions and 
their approximating spheres. Generally, n ~ 20 terms suffice to represent accurately 
particles with complex geometry [11]. The number of coefficients (c/^ q , «£,/, bpj) 
in the expression of g n is (n + l) 2 . 


6.3.2.2 Data Set 

Measurements performed on n s = 128 aggregates commonly found in concrete with 
sizes ranging from about 4 to 28 mm have been used to calculate the coefficients 
(a^ q, a^\, b^\) of gn\9, <p), i = 1, . . . , n s , in (6.25) and (6.27). The resulting 
coefficients have been organized in a matrix a coe ff with n s = 128 columns corre- 
sponding to the collection of aggregates and 3 1 2 = 961 rows for n = 30. For example, 
column i in a C oeff consists of the coefficients (aj! q, aj:\, b ^ J ; ) for k = 1 , ,n and 
h = 1 , ,k, corresponding to aggregate i. The first row of a C oeff gives the radii 
Oq q of the spheres associated with the set of aggregates. 

Statistical analysis in [18] suggests to scale the aggregates by the radii of their 
first order spherical approximations and assume that ( 1 ) the scaled aggregates belong 
to the same population and (2) the properties of the scaled aggregates and the scale 
of these aggregates are independent random elements. Let Z\ > 0 be the random 
variable giving aggregate scale, so that the coefficients a^L i = 1, . . . , n s , in the 
first row of a CO eff are interpreted as independent samples of Z\. 

Figure 6.10 shows estimates of the mean, standard deviation, skewness, and 
kurtosis for the scaled data h n {9 , cp)^ = g n (9, <p)b> i — 1 ,...,n s , with 
n = 30 in the top left, top right, bottom left, and bottom right panels as functions 
of the arguments ( 6 , <p) e [0, n] x [0, 27r]. Since the estimates vary with (6, (p), 
skewness is positive, and kurtosis differs from 3, a non-Gaussian inhomogeneous 
random field is needed to capture aggregate geometry. 


6.3.2.3 Model Calibration 

Let (S2, .iT, P) be a probability space, where the sample space Q consists of all 
possible outcomes, that is, all possible virtual aggregates, the cr -field & collects all 
relevant events, and P is a probability measure defined on ■'¥ . Let 
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Mean 


Standard deviation 



Fig. 6.10 Estimates of mean, standard deviation, skewness, and kurtosis for a population of 
n s = 128 scaled aggregates 


H : [0, 7r] x [0, 2i r] x Q [0, oo) (6.32) 

be a random field defined on this probability space, whose samples constitute virtual 
aggregates. Since H{0, <p, ■) : Q —> [0, oo) is a random variable for a fixed (9, <p) 
giving centroid to boundary point {0, < p) distances, it cannot take negative values. The 
functions h^{9, (p ), i = 1, . . . , n s , are viewed as independent samples of H(0 , (p), 
so that H models the scaled aggregates and 

G{9, <p) = Z\H{9, (p), (0,(p) e [0, tt] x [0, 2tt], (6.33) 

models actual size aggregates. Recall that Z\ is assumed to be independent of H. 

We approximate H by a sequence of translation random fields { II-]„ j, 
n = 0, 1, ... , defined by 


H T , n (d,(p) = F- l \0(N n (9,<p))\O,<pJ n = 0,1 (6.34) 

where F(-; 0, (p) denotes the marginal distribution of H &t(9,<p) e [0, zr] x [0, 27r] 
and { N n (0 , <p)}, n = 0. 1, . . . , is the sequence of Gaussian fields with mean 0, 
variance 1, and covariance function 
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Pn((0, <P), (O', *>')) = E [N„(0, tp)N n (0\ <p')\ . (6.35) 

The random fields Hp n are completely defined by the mapping q = F~ l o 0 and 
the covariance function p n of N„ . 

The construction of the Gaussian field N n in (6.34) can be based on spherical 
harmonics. Let {X„(9, (p)}, n = 0. 1, . . . , (9, (p) e [0, n] x [0, 2 tt], be a sequence 
of Gaussian random fields defined by 


X n (6, tp) = ^ A[ s) 0 p k (cos(9)) + ^ (a$ cos (hep) + B^ h sin (hip)) p k ,h (cos (9)) , 
k= 0 L h=\ 

(6.36) 

where (a[: ? q, a[^ , are Gaussian random variables and A q ? q — 1 . The normal- 
ized version, 


N n (0,<p) 


X„(0,<p)-E[X n (9,<p)] 
Je[(X„(0,<p)- E[X n (9,<p)]) 2 ] 


(6.37) 


of X n is a Gaussian random field with mean 0, variance 1, and covariance func- 
tion p n ( {0 , <p), (O ' , (p')). Numerical experiments suggest that the scaled covariance 
function of H(0, (p) does not differ significantly from p n ([12], Sect. 3.1.1). For 
simplicity, we approximate p n by this scaled covariance function. In summary, the 
random fields Hj „ in (6.34) have the required marginal distribution for all n > 0 and 
their covariance function differs slightly from that of H. The random field G(9,ip) 
in (6.33) is approximated by the translation model 

Gr,n(9, ip) = Z[Ht m ( 9, ip), (9, ip) e [0, n] x [0, 2n], (6.38) 

with Hj n in (6.34). Note that the model in (6.38) is physically consistent in the sense 
that Gj,n(0, ip) > 0 a.s. The distribution of Z\ can be estimated from its samples in 
the first row of a C oeff ■ The marginal distribution and the covariance functions of IIt.h 
can be estimated from rows 2 to 961 of a C oeff scaled by the first row of this matrix. 

The model Gp n (Q , ip) calibrated to the data set in matrix a coe ff can be used to 
generate virtual aggregates that are statistically consistent with the available popu- 
lation. The generation algorithm involves two steps. First, samples of N n (9, <p) and 
Z\ need to be generated. Second, samples of Gr, n (9, ip) need to be calculated from 
(6.38) and samples of N„ ( 9 , ip) and Z i . Figure 6.11 shows four virtual aggregates, 
that is, four samples of Gr.n(9, ip) in (6.38) with n = 30. 


6.3.3 Probabilistic Models for Microstructures 

Consider the stochastic elliptic boundary value problem 


-V • (A(x)WU(x)) = b(x), x e D 
U(x) = 0, x G dD, 


(6.39) 
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Fig. 6.11 Virtual aggregates given by spherical harmonic representations with n = 30 


where D is a bounded subset of W 1 , A(x), x e D. is a real-valued random field on 
a probability space (f2, , P ), and b(x), x e D, denotes a real-valued continuous 

function. We discuss stochastic equations of the type in (6.39) in Sect. 9.4. Most 
studies of (6.39) focus on mathematical conditions that A(x) must satisfy such that 
this equation admits a unique weak solution [1-3, 7, 9, 10, 35]. Physical requirements 
on A(x) relate to, for example, features of microstructures that define the probability 
law of A{x) and can be inferred from data and/or any other available information 
([19, 20] and [39], Chaps. 8 and 13), have not been examined systematically. 

A key assumption in [1-3, 7, 9, 10, 35], referred to as the finite dimensional noise 
assumption, is that ,4 (x) can be represented approximately by linear models of the 
type 


n 

A (n \x, co) = ^C k (co)0 k (x), x e D, co e £2, (6.40) 

k=i 

where n > 1 is an integer, {C k (()>)] are random variables on (Q , ■f' , P) that are 
usually assumed to be independent, and {6, t(x)} denote specified deterministic func- 
tions. In contrast to A(x) that usually consists of an uncountable number of random 
variables, A^ n \x) depends on a finite number of random variables. Truncated 
Karhunen-Loeve expansions and discrete spectral representations (Sects. 3.6.4, 
3.6.5) are examples of linear models of the type in (6.40). 
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Let t/ (w) (x) denote the solution of (6.39) with A (n) (x) in place of A(x), that is, 

-V • (A <n) (x, y)Vt/ (n) (x, y)) = b(x) (x, y) e D x r 

U (n) (x,y) = 0 (x,y) e dD x T, (6.41) 

where r = (Ci, . . . , C„)(I2) cl" denotes the image of the random variables in the 
expression of A {n) (x). Note that under the finite dimensional noise assumption the 
stochastic partial differential equation in (6.39) becomes the deterministic parametric 
elliptic partial differential equation in (6.41) defined onDxf [3]. The measure on 
D x r is the product of the Lebesgue measure on D and the probability measure f ly ) dy 
on r, where fly) denotes the joint probability density function of (Ci, . . . , C n ). To 
assure the existence and uniqueness of the solution of (6.41), the finite dimensional 
noise assumption is supplemented by additional constraints on the samples of A 1 (x ) 
[1-3,7, 9, 10, 15,35], 

We examine potential limitations of the linear model in (6.40) with independent 
random coefficients {C*}, and develop alternative models for A(x). It is shown that 
linear models with independent random coefficients of the type in (6.40) are approx- 
imately Gaussian, so that they may be unsatisfactory in many applications, and that 
efficient algorithms can be developed for constructing linear models that are consis- 
tent with both mathematical and physical constraints. 


6.3.3.1 Linear Models with Independent Coefficients 

Suppose A lx) in (6.39) is a real-valued random field with correlation function that is 
continuous and square integrable in D x D. Then Alx) admits the Karhunen-Loeve 
expansion 


OO 

A(x, m) = E[A(x)] + ^ X. 1 / 2 9 k (x)Y k (a>) , x e D, (6.42) 

k = t 


where the equality is understood in m.s., { Y k ] are uncorrelated random variables with 
mean 0 and variance 1, and [A.* , 6 k (x)} denote the eigenvalues and eigenfunctions 
of the correlation function of A(x) ([21], Sect. 6.2, and Sect. 3.6.5 in this book). 
Truncations, 


n 

A (n \x, co) = E[A(x, a>)] +^xl /2 6 k (x)Y k (co), x e D, (6.43) 
k = 1 

of the representation of A(x) in (6.42) produce linear models for A(x) of the type in 
(6.40). Most developments on stochastic Galerkin and collocation methods assume 
that the random variables {Y k } in (6.42) and (6.43) are independent and take values 
in bounded intervals. The assumption of independence is valid for Gaussian fields 
but is invalid for non-Gaussian random fields. If A(x) is a Gaussian field, the random 
variables {7^} are Gaussian so that they cannot be bounded. 


224 


6 Probabilistic Models 


Linear models similar to those in (6.43) can be obtained by the spectral repre- 
sentation theorem for weakly stationary random belds ([13] and Sect. 3.6.4 in this 
book). These models are trigonometric polynomials with random coefficients. Since 
our discussion is limited to homogeneous random fields A(x), we only consider 
linear models derived from the spectral representation theorem. For simplicity, it is 
assumed that D in (6.40) is an interval of the real line. 

Let A(x) be a real- valued weakly stationary random field with mean 0, variance 1, 
and one-sided spectral density g(v) with frequency band (0, t>), 0 < v < oo. Define 
the sequence of random fields 

A { "\x) — ^ i r ^ (a ( "1 cos(v^x) + A^ sin(u["V)'j , x e D, n — 1,2,..., 

(6.44) 

(n) (n) 

where {A) k , A) k ) are uncorrelated random variables with mean 0 and variance 
1, Av = v/n , v£° = (k - l/2)Av, and (ct a ( " ) ) 2 = jf k t\ )Av g(v) dv, k = 
1, . . . , n. The mean and covariance functions of A in> (x) are E[A {n) {x )] = 0 and 
£ , [A ( ' !) (x)A ( " ) (x , )] = Sr-=i( cr / L ( ” ) ) 2cos (x— x')), so that A (,i) (x) has unit vari- 

ance. The first two moments of A in) (x) converge to those of A(x) as n —> oo for 
v < oo, and this convergence extends to the case in which A(x) has an unbounded 
frequency band (Sect. 3.8.1). If A(x) is a Gaussian field, then {A^, A [ "j,} are inde- 

(fi') (vi) 

pendent N( 0, 1) variables. Otherwise, {A), A v are uncorrelated but dependent 
random variables. 

(n) (n) 

Suppose, following current developments, that {A), k , A s k ) are independent ran- 
dom variables with specified distributions. The characteristic function of A <H> (x) is 


fAW(i)(») = E 




fl ( va 1 ;: 

k=\ V 


<») ( ua k l) cos (v i k n> x))(p A o,)(uaj c " ) sin(r{'°x)) 


where w .(„> and (p m denote the characteristic functions of A*",! and A^'l. 


(6.45) 


(ft') (ft) 

Example 6.3 Suppose { A ( . k , A v k } are independent random variables following a 
Beta distribution with shape parameters (p > 0, q > 0) and range [a, b], so that 
A { "1 = A { "1 = Y, Y = a + (b — a)Z, and Zis a standard Beta variable with para- 
meters {p, q ). The density of Z is f(z) = z p ~ [ {\ — z) q ~ l / B(p, q), z e [0, 1], 
with mean E[Z] = p/(p + q) and variance Var[Z] = pq/[(p + q) 2 (p + 
q + 1)] ([25], Chap. 24). If the parameters (a, /;, p. q) satisfy the conditions 
a = — ( p/q)b , p = q(q + 1 )/(b 2 — q), and b 2 > q, the random variable Y has 
mean 0 and variance 1, for example, E[Y] = 0 and E[ V 2 ] = 1 for (a = 
— 1.4545, b = 8, p = 1.6364, q = 9). The density of the standard Beta variable 
Z for these parameters is highly skewed to the left, as illustrated in Fig. 6.12. 

The characteristic function of A ■”- l (x) is given by (6.45) with <p m and <p An) 

A c,k A s,k 

equal to the characteristic function of Y. Figure 6. 13 shows with heavy dotted lines 
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Fig. 6.12 Density of Z with 
shape parameters 
(p = 1.6364, q = 9.0) 



x 




Fig. 6.13 Real and imaginary parts of the characteristic functions are shown with thin 

solid and thin dotted lines , respectively, for n = 1 ( left panel ) and n = 10 ( right panel). Heavy 
dotted lines are the characteristic function of N( 0. 1) 


the characteristic function of the standard normal variable. The thin solid and dotted 
lines in the figure are real and imaginary parts of (PaW( X ) ( m ) at 20 equally spaced 
arguments x e [0, r = 10] for n = 1 (left panel) and n = 10 (right panel). As n 
increases, the marginal distribution of A 1 (x ) approaches rapidly the distribution of 
N( 0, 1) and becomes space invariant. O 

The numerical experiments in Example 6.3 are consistent with the following two 
theorems showing that A ln) (x) becomes a stationary Gaussian random function as 

(f2^) (ft') 

n — > og under some mild conditions on A k and A k . Consider the alternative form, 


A(n) « = Z ( U k n 'w + V k n) (x)), 

k—l 


(6.46) 
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of the linear model A (n Hx) in (6.44), where U^\x) = cos(v^ x) and 

vl"\x) = A ^ sin(u|" , x). Let F^(„) and F v <n) (x) denote the distributions of 

in') (ft') 

random variables U k (x) and V k (x). These distributions have means 0 and vari- 
ances (ojf* cos(v^ ,!) a'))“ ~ 0(n~ l ) and (er A (,!) sin(v^x)) 2 ~ 0(« _1 ), respectively. 
Note that the random variables uj. n} (x) and V k '\x) are uncorrelated, £[A®(t)] = 
0, and E[A (n) (x) 2 } = 1. 

Theorem 6.1 If {A [ " k , A^} in (6.44) are independent random variables and the 

distributions F. An), andF.Jn), , are such that f, . t z 2 F,An), Adz ) ~ 0(n~ {l+a) ) 
U k (x) V k (x) J UI>? U k (x) y 

and z 2 Fy( n ) ^(dz) ~ 0 ( h ~ <1+ “ > ) hold as n — ► oo for a > 0 and arbitrary 
f > 0, then A ( "^(x) => iV(0, 1) at each x & D as n -> oo. 

Proof Set and s~ = Xr-i pf , where [Z^, h > 1} are independent 

random variables with means L[Z|] = 0, variances p| = £[Z?], and distributions 
. If the Lindeberg-Feller condition is satisfied, that is, 

1 " l " f 

— E\zll(\z k /s n \ > £)] = — / z 2 Fk(dz) — »■ 0, as n — >■ oo, 

s " fe=t s « k=i 

(6.47) 

for all £ > 0, then .S), /.v„ =>■ /V(0, 1) as n — > oo, that is, the sequence of dis- 
tributions of S n /,v„ converges to the distribution of the standard Gaussian variable 
1V(0, 1) as n increases indefinitely ([33], Theorem 9.8.1). 

We show that the Lindeberg-Feller condition holds for the sequence of indepen- 
dent random variables {U k n> (x), V k n \x)} in the dehnition of A^f x). For our case, 
the parameters 2 and the integrals z 2 F) : ((7z)in(6.47)areit[A ( '^(x) 2 ] = land 

Z 2 F u( „ ](x) (dz); Z 2 F v (n) (x) (dz), respectively. Accordingly, the summa- 

tion (1/s 2 ) £'! = i z 2 F k (dz) is of order XLi (0(n~^)+ 0{n~^)) = 
0(n~ a ), so that A (n \x) => N( 0, 1) as n — > oo at each x e D. The assumption 
that A(x) has a bounded frequency band [0, v], v < oo, can be relaxed since our 
arguments hold for an arbitrary v so that they also apply in the limit as C — > oc. A 

(ft') (ft') 

Note that if the random variables (A c k . A s k ) in (6.44) are bounded a.s., that 
is, there exists a > 0 finite such that F > (|A*, ,! ^| > a) = P(\A^ k \ > a) = 0 
implying | U^\x)\ < a k l) a and V k '\x)\ < o k ' ) a a.s. Since ~ 0 ( h -1 / 2 ), 
there exists an integer n k > 1 for f > 0 such that o k ' ] a < f. n > n k . Hence, 
A ln) (x) =>■ N (0. 1) as n — > oo by the Lindeberg-Feller condition, since the inte- 
grals Jj z | >? z 2 F u w M (dz) and Jj z|>f z 2 F v <n) (x) (dz) are equal to 0 for n > . 

Theorem 6.2 Under the conditions of Theorem 6. 1 , A (ll Ax) converges to a station- 
ary Gaussian random function as n — »■ oo with the second moment properties of 
Mx). 
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Proof We need to show that for any integer m > I . arguments x\, ... ,x m e D, and 
coefficients e R, the random variable 

m n in 

Sm,n =Y J Pi A<Jl) (x i ) = 

i=l k= 1 i=l 

is Gaussian ([23], Sect. 5.5). The mean and variance of S mM are E\S m n \ = 
X?=iPiE[AW(xi)] = 0and 

= X AA^^telA^)] = X A A X K ') 2 cos ( v *° ( *i - 

67=1 67=1 6=1 

< X \fcPi 1 X ( <t A'" ) ) i cos ( v l n) fc - *>■)) i ^ (z iai) < 00 

i,7=l fc=l \i=l / 

since (er^"') 2 = 1 by assumption. Note that {f jUf. ,l) (xj)} and (A v£ n \xi)} 

play the role of the random variables [Za] in (6.47). The Lindeberg-Feller condition 
requires that 


. n m 

T^ZZ A 2 


/ , . Z F 00, Adz) 

+■ Z 2F v (ri>, x .)(. d z) 


converges to 0 as « -» oo for all £ > 0. Under the conditions of the previous 
theorem, we have this convergence so that A ln) (x) becomes a Gaussian function for 
large values of n. Since the second moment properties of A ( " 1 (x ) in (6.44) converge 
to those of A(x), the limit of A (ll> (x) as n — > oc is a Gaussian field that is equal to 
A(x) in the second moment sense. ▲ 


6.3.3.2 Linear Models with Dependent Coefficients 

Let A(x), x e D, be a real-valued random field with finite variance defined on a 
bounded interval D of the real line, and let A (n> (x) — Xit=i CtAOO be a linear 
model of the type in (6.40). The coefficients [Ca] are uncorrelated but dependent 
random variables, unless A(x) is a stationary Gaussian field. As previously mentioned, 
most methods for solving stochastic elliptic partial differential equations assume 
that {Cj t] are independent random variables. There may be two reasons for this 
assumption. First, the construction of the joint distribution of (C \ , . . . , C n ) such that 
the probability law of A (n fx) matches that of a target random field A(x) is difficult. 
Second, the calculation of expectations of the type E[h{C \ , . . . , C„)] involved in 
numerical solutions of (6.39) and (6.41) is simpler if [Ca] are independent rather than 
dependent random variables, where h : R" — > R denotes a measurable function. 
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Let A(x, <o s ), co s = 1, . . . , n s , ben, independent samples of A(x), x e D. 

Our objective is to construct samples (C i(co s ), . . . , C n ((o s )) of (Ci, . . . , C n ) such 
that the corresponding samples A ( "\x, M s) = ZLt C k ((o s )9 k {x), s = 1, . . . , n s , 
of A (n \x) are close to the samples A (x, a> s ), s = 1, n s , of A(x). For example, we 

may select (Ci(<o s ), . . . , C n (a > s )) to minimize a distance d(A(x, co s ), A (,,> (x, co s )), 
s = 1, n s , between samples of A(x) and A^ n \x). If A(x) has continuous samples 
and the functions {9 k (x)} are continuous, the discrepancy between A(x, co s ) and 
A (n) (x, co s ) can be measured by 


d(A(x, cl > s ), A {n \x, Ws)) = sup 

xeD 


n 

A(x, co s ) - ^ C k (u>s)O k {x) 
k=\ 


(6.48) 


that is, the sup metric in the space C(D) of real- valued continuous functions defined on 
D. Optimization algorithms can be used to find {C k (co s )} minimizing the objective 
function in (6.48). Alternative objective functions can be used to construct sam- 
ples of (Ci, . . . , C„), for example, objective functions depending on differences 
between samples of A(x) and A (, ^(x) and of dA(x)/dx and dA^(x)/dx provided 
that almost all samples of A (x) and d A (x ) /dx and the derivatives of the basis func- 
tions \0 k {x)} are continuous functions. The optimization algorithm has a unique 
solution. For example, if the basis functions {9 k (x)} are algebraic or trigonometric 
polynomials, then for every e > 0 there exists an integer n(e, u> s ) > 1 such that 
d(A(x,a > s ), A^'^Cx, co s )) < s ( [4], Theorem 6.1.1). Moreover, the coefficients 
(C \(<o s ), . . . , C„(co s )) of the polynomial with this property are unique ([4], Corollary 
7.4.2). Similar results can be found elsewhere ([30], Theorem 13.1 and Sect. 14.3) 

Let (C i(<o s ), C n (cL) s )), s = 1, . . . , n s , be n s samples of (Ci,...,C„) 
corresponding to n s independent samples A(x, oj s ) , s = 1 n s , of a target ran- 

dom field A(x) delivered by an optimization algorithm with objective function of 
the type in (6.48). The samples of (C i, . . . , C n ) can be stored for later processing 
and/or used to estimate statistics of this random vector. We construct estimators for 
the characteristic and the distribution functions of (C\, . . . , C n ) and of the linear 
model A^(x). 

Let 


<PCi,...,C„(mi> ■ • • , u n ) 


_L V j ELi » k c? 

n s 


(mi, ... , u n ) e K", 


(6.49) 


be an estimator of the characteristic function 

PC„....C> l, • ■ ■ , u n ) = E[e‘^ UkCk ], (Ml, . . . , u n ) G R" , (6.50) 

of the random vector (C i, . . . , C n ), where (C} s) , . . . , C^*), s = 1, ... ,n s , are 
independent copies of this vector. Let also 


^AW(X!) AW( % )(« 1 u rn) = $Ci,...,C, 


, m m v 

•„ ( ^M r 0i(x r ), . . . , y U r 9n(x r )\, 

'r=l r = 1 ' 


(6.51) 
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be an estimator of the characteristic function 
^W( I1 ),....AW( % )(« 1. • • ■ > Mm) = E[e' 

( m m 

y, M r 01 (x r ), ■ ■ ■ , y M,-0„ (X r ) 
r= 1 r=l 

(6.52) 

of (A (,,> (xi), . . . , A (,i, (x m )), where m > 1 is an integer and xi, . . . , x m e D are dis- 
tinct points. Similar estimators can be constructed for the distribution of (Ci, . . . , C n ). 

Theorem 6.3 The estimators tpa C„(wi, . . . , u n ) and {p A ( n ) ( Xl) A M( Xm )(u i, . . . , 

u m ) in (6.49) and (6.51) are unbiased and weakly consistent. The discrepancy 
between the characteristic functions of A(x) and A (n, (x) can be bounded by 

E[\(Pa(u ) - <Pa<"|(^)(“)I] - Ii°a(m) - ^a<")(^)(m)I + ( Var [^A<")(A)(M)]) 1/2 - ( 6 -53) 

Proof We show that the mean of ipci,...,C n (mi, . . . , u„) is ^>Ci,...,C„ (mi , . . . , u n ) and 
that 

•P(I^Ci,...,c„(mi u n ) - ^Ci,...,c„(mi, . . . , m„)| > e ) 0 

as«i — > oo for arbitrary s > 0 ([33], Sect. 6.2.1). Similar conditions must be satisfied 
by the estimator in (6.51). Elementary calculations give 


E[q>Ci....,cM \ , . . . , m„)] = <PCi,...,C„(mi, . . ■ , m„) 

1 r 1 

Var[^ Cl ,..., c „(Mi, . . . , m„)] = - -II - |?>Ci,...,c n (Mi, ■ ■ ■ , u n )\ I < — , 
n s n s 


(6.54) 


so that the estimator (pc\,...,C n (mi, . . . , u„) is unbiased. The Chebyshev inequality, 


^(I^Ci,...,C„(mi Un) - (pci,...,c n (ui, Un) | > fi) 

< Var[ipci,...,c„(Mi M„)]/e 2 , 


implies that the estimator is weakly consistent. Since the estimator in (6.51) is the 
estimator fc\ C n (u\ ■ . . . , u n ) for particular arguments, it has the same properties 
as this estimator. 

The discrepancy between the characteristic function (p A (x)(u) = <Pa(u) of A(x) 
and the estimator of the marginal characteristic function of A (n> (x) in (6.52) for 
m — 1 can be bounded by 


\(Pa(u) -tp A w w ( m ) | < \(p A (u) - (p A W( x) (u)\ + \(p A ( n ) ix) (u) - (p A <n)( x )( u )\’ ( 6 -55) 

so that 


E[\(Pa(u) - q> A M( x) (u)\\ < \(Pa(u) - (p A w (x) (u)\ + E[\<p AM(x) (u) - ^ a w w ( m )|], 

(6.56) 

The bound in (6.53) results by applying the Cauchy-Schwarz inequality to the sec- 
ond term on the right side of (6.56). The first and the second terms of the bound 
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on E[\<pa(u) — $ A (n>( ; t)(n)|] relate to model accuracy and statistical uncertainty, 
respectively. Since Var[^ A („)( Y) (M)] < 1 /n s , the second term can be made as small 
as desired by increasing the sample size. A 

Additional information on sample properties of A(x) is needed to bound the dis- 
crepancy \<pa(u) — (p A ( n ),x)(u ) I corresponding to model accuracy (Theorem 6.4). 
Suppose 

A (n) " 

A {n \x) = -5- + ^ ( A ( k n) cos(v k x) + B ‘ ,!) sin(v*jc)), t e [0, T], (6.57) 

2 k= l 

where {A^\ k = 0, 1 n, B^ n \ k = 1 n } are random variables. The random 

variables {A ( "\ B^j and the basis functions (1, . . . , cos(v£.x), . . . , sin(vfcx), . . . ) in 
(6.57) are similar to {C* } and {dk(x)} in (6.40). The samples of A ( "'(x) are members 
of the linear space 

&n[ 0, T] = ^(f) = y + ^ (a k cos(v k x) + Pk sin(vtx)) J (6.58) 

of trigonometric polynomials of order n with period T > 0, where v\ = 2 n/T, 
v k = kv i, and ak, Pk e K, k = 1,2 There are at least two reasons for 
focussing on linear models of the type in (6.57). First, trigonometric polynomials 
have been and are used extensively in applications to approximate deterministic 
and random functions. Second, the model in (6.57) resembles the discrete spectral 
representation for weakly stationary random fields and can be used to represent both 
stationary and non-stationary non-Gaussian random fields [16]. 

(n) (ri) 

The following algorithm can be applied to find samples {A';, (co), B k (co)} of 
{A["\ corresponding to samples A(x, co) of A(x). Let q* e ^„[0, T] be the 
optimal trigonometric polynomial of order n corresponding to a sample A(x, co) of 
A(x), that is, q*(x) has the property || A (•, co) — q*(-) ||oo = m i n ? e^„[ 0 ,r] || A(-, co) — 
(•) II oo - The samples {A k '\aj), B k '\co)} of {A["\ B^} are set equal to the coeffi- 
cients {a k , P k ] of q*(x). 

Theorem 6.4 If almost all samples ofA(x) are continuous and periodic with period 
T > 0, almost all samples of A'(x) = clA(x)/dx are continuous, and the random 
variable || A’ (■, a>)||oo is integrable, then 

II A (- , co) - A (n) (-, < 77 II A / (- , ©)|| 00 , 

2 (n + 1) 

I <®A W («) - <P A M(x)M I < ^“'^ [UA'O, fi,)||oo], (6-59) 

and 

E[\<p A (u) - 0 AWw («) |] < g[||A'(-, «)||oo] + (Var[^ AWw ( M )]) 1/2 . 

(6.60) 
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Proof We use a theorem for periodic functions with continuous derivatives stating 
that if h : R. -* R is a continuous periodic function with period T and has con- 
tinuous first order derivative, then min^g^fo.r] 11^ — <?||oo < n\\h'\\oo/\2( n + 1)] 
([30], Theorem 15.1). Since the samples A(x, co) of A(x) may not have the property 
A (0, co) = A(T, co), the theorem cannot be applied directly. If A( 0, co) f A(T, co), 
there is no sequence of polynomials in AS,, [0. T\ that converges to A(x, co) at x = 0 
and .v = T ([38], Sect. 1.10). However, A (x, co) can be altered such that its modified 
version has equal values at the ends of [0, T], Let 

A(x, co) = A(x, &))1(0 < x < T — e) 

+ [A{T, co) + ( A (0, co) - A(T, co) (x - T + e)/e]l (T -e <x<T), 

(6.61) 


where e e (0, T) is arbitrary and x = (1 — s/T)x, x e [0, T], We have A(0, co) = 
A(0, co), A(T — s, co) = A(T, co), A(x, co) foric e [0, T — e] coincides with A(x, co) 
for.r e [0, T], A{T , co) — A(0, co), and A(x, co) in [T — s, T] is a line connecting the 
points A(T — s, co) — A{T , co) and A(T, co) = A(0, co). Higher order splines can be 
used to define A(x) in [T — s, 2T] such that, for example, its samples be continuously 
differentiable provided the samples of A(x) have this property. Similar arguments 
can be used for real-valued random fields defined on rectangles x^ =1 [0, Tf) e A d by 
using properties of multivariate Fourier series ([38], Chap. 7). Alternatively, linear 
models A^'fx) can be constructed for A (x) on sets including D and the restriction 
of these models to D can be used for calculations. 

Assume without loss of generality that A has the property A(0) = A(T) almost 
surely. The first inequality in (6.59) follows from [30] (Theorem 15.1). The second 
inequality in this equation follows from 

I WwM - PaOOmGO I = \E[e iuA ^{\ - e'AAWw-AM))]! 

<£■[!!_ gi«(AW(jr)-A(x))|j 

< \u\E[\A^(x) - A(x)\] < -^-£[||A , (-,«)|| 00 ]. 

2 (n + 1) 

where we use the first inequality in (6.59) and 1 1 — exp(/<7)| < \a\, a el. The latter 
inequality and the bound in (6.53) give (6.60). The resulting bound shows that for a 
fixed u e R. the discrepancy between a target characteristic function <p,\ (u) and its 
estimate <Pa("Ux) ( u ) corresponding to a linear model A*"- 1 (x) of order n and based on 
n s independent samples of A(x) can be made as small as desired by increasing the 
model order and the sample size. Similar arguments can be used to construct bounds 
on the discrepancy between estimators for finite dimensional characteristic functions 
of A(x) and A^ n fx). ▲ 

Example 6.4 Let G (x ) , x e [0, 1], be a stationary Gaussian field with mean 0, vari- 
ance 1, and covariance function p(j) = E[G(x)G(x + r)]. The first two moments 
of the lognormal translation field 
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e G(x) _ 1/2 

A ^ X ) = WT ( 6 - 62 ) 

(e z — e) 

are £[A(x)] = 0, E[A(x) 2 ] = 1, and £(r) = E[A(x)A(x + r)] = (l - e^ (r) )/ 
(1 — e) ([12], Example 3.1), so that £(r) = exp(— oe | r |) for p(r) = ln(l + (exp(l) 
— l) exp(— a\r |)), a > 0. The Gaussian field G(x ) has continuous samples by the 
Kolmogorov condition ([40], Proposition 4.2, and Theorem 3.1 in this book) since 
£’[(G(x + r) — G(x)) 2 ] < ( 2{e — l)/2)r 1+ ^ for any ji > 0 and r > 0. Hence, 
A (x) has also continuous samples since the mapping G(x) i-v A(x) is continuous. 

Consider the linear model in (6.40) with basis functions {9 k (x)} taken to be 
modified Chebyshev polynomials \T k {x)} defined on [0, 1], that is, 


A (n) (x) = ^C k T k (x), x 6 D = [0, 1], (6.63) 

k =o 

where k = 0 is included in the summation for consistency with the indexing 
of Chebyshev polynomials. The Chebyshev polynomials satisfy the recurrence 
formula T k +\ (x) = 2(2x — l)T k (x) — T k - i(x), k = 1,2,..., with 7o(x) = 1 
and T\ (x) = 2x — 1 , and can be obtained from the classical Chebyshev polynomials 
T k (u ) = cos ( k arccos(w)), k = 0, 1, . . . , defined on [—1, 1] by the change of vari- 
able u = 2x — 1, and the relationship T k+ \(u) — 2uT k (u) — T k -\(u), k = 1,2, , 
with Tq(u) — 1 and T\{u) — u. Note that the samples of A (,i, (x) belong to a sub- 
space of C[0, 1] spanned by (7b(x), ri(x), . . . , T n (x)), while the samples of A(x) 
are members of C[0, 1], 

Figure 6.14 shows with solid and dotted lines five samples of A(x) defined by 
(6.62) with a = 10 and the corresponding samples of A (,,) (x), respectively. The 
samples of A ( '^(x) have been obtained by an optimization algorithm with objective 
function given by (6.48) for n = 4, 9, 14, and 19. The difference between the 
samples of A(x) and A ( "\x) decreases with n, and becomes negligible at the figure 
scale for n = 19. Figure 6.15 shows with thin solid and dotted lines the real and 
the imaginary parts of estimates <p A ( n ) {xj ( u ) of the characteristic function of A (, ^(x) 
in (6.57) with n = 9 at x = 0.1, 0.4, and 0.7 based on n s = 50 (left panel) 
and n s = 200 (right panel) independent samples of ( Cj , . . . , C „ ) calculated by 
minimizing the objective function in (6.48). The real and the imaginary parts of the 
target characteristic function q),\ (u) are shown with heavy solid and dotted lines, 
respectively. The estimates of the characteristic function of A^(x) are consistent 
with the target characteristic function and improve with the sample size. 

The discrepancy between the target characteristic function ( u ) and its estimates 
(p A in) (x)( u ) has a component related to model accuracy and a component related to 
statistical uncertainty corresponding to the first and the second terms of the bound 
in (6.53). The plots in Fig. 6. 15 show that the statistical component of the discrep- 
ancy between (Pa(u ) and (p A W( X )( u ) can be reduced by increasing the sample size 
n s . The model component of this discrepancy depends on the accuracy of the rep- 
resentation A (,,) (x) of A(x), and can be reduced by increasing the model order n. 
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Fig. 6.14 Five samples of A(x) ( solid lines) and A fn \x) ( dotted lines) for n = 4 ( top left panel), 
n = 9 ( top right panel), n = 14 ( bottom left panel), and n = 19 ( bottom right panel) 


For example, the distance sup_ 15 <m< 15 {|^(m) — q> A (n)^(u)\} between (Pa(m) and 
* s 0-2539, 0.2353, and 0.2904 at x = 0.1, 0.4, and 0.7 for n = 3, and 
decreases to 0.1332, 0.1137, and 0.1749 at x = 0.1, 0.4, and 0.7 for n = 9. The 
estimates <p A („)( x f u) are based on n s — 50 (left panels) and n s = 200 (right panels) 
independent samples of the random coefficients {C^} in the expression of A ( "\x) 
given by (6.63). O 


6.4 Exercises 


Exercise 6.1 Extend the Bayesian analysis in (6.5) to (6.8) to an R^- valued Gaussian 
variable with unknown second moment properties. 

Exercise 6.2 Construct plots as in Fig. 6.4 for sets of 10 and more independent 
samples of X and various prior densities f {a, X, p). 

Exercise 6.3 Extend considerations in Example 6. 1 to an R^-valued translation vec- 
tor with Beta distributed coordinates. 
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u 


Fig. 6.15 Heavy solid and dotted lines are real and imaginary parts of i pa(u). Thin solid and dotted 
lines are estimates of real and imaginary parts of PaW (*)(“! at x = 0.1, 0.4, and 0.7 based on 
n s = 50 ( left panels ) and n s = 200 ( right panels) samples 


Exercise 6.4 Develop estimates for extreme wind as in Fig. 6.6 using posterior den- 
sities for the uncertain parameters of the wind speed process, rather than point esti- 
mates. 

Exercise 6.5 Perform calculations as in Example 6.2 for a real-valued random field 
defined on a bounded rectangle in K rf , d > 2. 

Exercise 6.6 Propose alternatives to the models in (6.22) and (6.23) that are consis- 
tent with the available information. 

Exercise 6.7 Repeat the calculations in Example 6.3 for A ( "' k and A having the 
same second moment properties as in this example but following different distribu- 
tions. 

Exercise 6.8 Let X (?) = G(f) 3 , where G(t ) = A cos(p?) + B sin(uf), v > 0, and 
A and B are independent N (0. 1). Show that X(t) has the representation in (6.44) 
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with uncorrelated but dependent random coefficients. Develop a linear model with 
dependent coefficients for X (t). 

Exercise 6.9 Let G(x \,X 2 ), (x \ , x 2 ) £ [0, l] 2 , be a real-valued Gaussian field 
defined by G(xi, x 2 ) = (aj + (a\G \{x\) + fl 2 G 2 (* 2 )), (xi , x 2 ) £ [0, l] 2 , 

whei'e a\ , a 2 £ R and G r (x r ), r — 1,2, are two independent real-valued, stationary 
Gaussian random fields with mean 0, variance 1, and one-sided spectral densities 
g r (y) = 1(0 < v < v r )/v r , v r > 0. The image A(xi,x 2 ) of G(xi,X 2 ) given by 
(6.62) is a lognormal translation field defined in [0, l] 2 . Let 

n 

A (n \x u x 2 ) = Y, C k ,T k (x l )T,(x 2 ), (xi, x 2 ) £ [0, l] 2 , (6.64) 

k ,l = 0 

be a linear model for A (x\ , x 2 ) , where T k (xi) and T\ (x 2 ) are Chebyshev polynomials 
as in (6.63). Develop an optimization algorithm using the objective function in (6.48) 
to calculate samples of {C k /} from samples of A(xi, x 2 ). 
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Chapter 7 

Stochastic Ordinary Differential and Difference 
Equations 


7.1 Introduction 

Differential and difference equations with deterministic and/or random coefficients 
and input are used extensively in applications to describe the behavior of a broad 
range of physical systems. It is common to describe the states of dynamic systems 
in random environment by solutions of ordinary differential or difference equations 
with random input and deterministic or random coefficients depending on the uncer- 
tainty in the system properties. We refer to equations with deterministic coefficients 
and random input as stochastic equations, and equations with both random input and 
coefficients as stochastic equations with random coefficients. Differential and differ- 
ence equations with random coefficients and input can be divided in three classes: (i) 
equations with state independent, time invariant random coefficients, (ii) equations 
with state independent, time dependent random coefficients, and (iii) equations with 
state dependent random coefficients. 

Equations with state independent, time invariant random coefficients may result 
from stochastic partial differential equations by spatial discretization via finite dif- 
ference, finite element, or other approximate methods [1-3]. The finite difference 
representation (1.5) of the partial differential transport equation (1.3) is an ordinary 
differential equation of this type. Depending on whether time argument is or not dis- 
cretized, the resulting equations are differential or difference equations with random 
coefficients. A summary of methods for analyzing this class of stochastic equations 
can be found in [4] (Sects. 8.4.1, 8.8, and 9.2). 

Equations with state independent, time variant random coefficients are encoun- 
tered in economics, sociology, engineering, and other applied fields, and describe 
situations in which modeling conditions change over time. For example, the ampli- 
tude of a mathematical pendulum with length varying randomly in time according 
to a specified law satisfies a second order ordinary differential equation with random 
coefficients [5]. Differential equations with white noise coefficients, referred to in 
random vibration theory as equations with multiplicative noise, are used to establish 
conditions for the stability of the state of dynamic systems in random environment 
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([4], Sec. 8.7, [6]). Examples of equations of this type are in (1.7) and (1.9). Sta- 
bility conditions and moment equations have also been developed for solutions of 
differential/difference equations with random coefficients evolving in time accord- 
ing to semi-Markov processes [7, 8, 9, 10, 11, 12]. Difference equations with state 
independent, time dependent random coefficients are used extensively in economics, 
sociology, and stochastic hydrology [13, 14], and in a broad range of applications in 
mechanics [7, 8, 9, 10, 1 1, 12], This class of equations has also been employed to con- 
struct non-Gaussian models ([15], Sect. 4.1). Some of the non-Gaussian models have 
specified marginal distribution and correlation functions ([16] and [17], Sect. 3.6.1). 

Equations with state dependent random coefficients are relevant in many applica- 
tions, for example, they have been used to characterize damage evolution in Daniels 
systems [18, 19] and systems with initial cracks ([20], Sect. 7. 5. 2. 3) subjected to 
Gaussian actions, and calculate the reliability of ideal elasto-plastic oscillators sub- 
jected to Gaussian white noise [21], Difference equations with state dependent ran- 
dom coefficients, for example, GARCH and ARCH models ([22], Chap. 10) and 
threshold autoregressive models ([15], Sect. 4.2) have been used in finance to cap- 
ture market volatility and characterize river flows, respectively. The last section of 
this chapter will present examples of differential equations of this type. 

A common feature of the solutions of differential/difference equations with ran- 
dom coefficients is that they are non-Gaussian processes/time series. The probability 
laws of these solutions are determined by the functional form of their defining equa- 
tions and the properties of both the random coefficients of and the input to these 
equations. Equations with state independent, time invariant random coefficients and 
with state dependent random coefficients are the simplest and most difficult to solve, 
respectively. The analysis of the latter type of equations poses notable difficulties 
since their coefficients are functionals of state history. 

Monte Carlo simulation is the only general method for solving equations with 
random coefficients. Analytical solutions for these equations have been obtained in 
special cases that, generally, are of limited practical interest. A host of approximate 
methods has been proposed for solving equations with random coefficients under 
the assumption that the uncertainty in their coefficients can be captured by a finite 
number of random variables. 

Let X{t) be an K' 3 ' -valued stochastic process satisfying the differential equation 
dX(t) = a(X(t-), Y(t-))dt + b(X(t~), Y(t-))dS(t), t > 0, (7.1) 

where the entries of the ( d , 1) and (d , d') matrices a and b are real-valued functions, S 
denotes an d ' -dimensional semimartingale, and Tis an RA -valued stochastic process 
with specified probability law. The processes S and Y are assumed to be mutually 
independent and independent of the initial state X(0). 

In the special case in which Y(t) is time invariant, that is, an IftA -valued random 
variable denoted by 0, (7.1) becomes 

dX(t) = a(X(t—), 0) dt + b(X(t—), 0) dS(t ), t > 0. (7.2) 

Note that (7.1) and (7.2) are stochastic differential equations of the type studied in 
Sect. 5.5.1 on samples of Y(t) and 0, respectively. 
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Definition 7.1 The stochastic equations in (7.1) and (7.2) are referred to as equa- 
tions with state independent, time variant and time invariant random coefficients, 
respectively. If the coefficients a and/or b in (7.1) and (7.2) are functions of the entire 
state history rather than its current value, in addition to Y or 0, then (7.1) and (7.2) 
are called stochastic equations with state dependent random coefficients. 

Our objectives are to establish conditions under which stochastic equations with 
deterministic and random coefficients have solutions that are unique, and explore 
methods for solving these equations. Heuristic and rigorous methods for solving sto- 
chastic equations are discussed. Both continuous and discrete time stochastic equa- 
tions are considered. Section 7.2 examines stochastic equations with deterministic 
coefficients of the type considered in random vibration. Discrete and continuous time 
systems are discussed in Sects. 7.2.1, 7.2.2 and 7.2.3. Developments in continuous 
time systems are based on Ito’s formula rather than heuristic arguments. Methods 
for solving discrete and continuous time systems with random coefficients and input 
are discussed in Sects. 7.3 and 7.4. The methods include Monte Carlo, conditional 
simulation, state augmentation, stochastic reduced order models, stochastic Galerkin, 
stochastic collocation, and simplified techniques for the case in which the uncertainty 
in the random coefficients of these equations is small. Theoretical developments in 
this chapter are applied in Sect. 7.5 to study stochastic stability and noise induced 
transitions for some dynamic systems, discuss a class of random vibration problems 
of practical interest, and study the behavior of a simple degrading system whose state 
satisfies a stochastic differential equation with state dependent coefficients. 


7.2 Stochastic Equations with Deterministic Coefficients 

Equations of the type as in (7.2) with coefficients that do not depend on 0 are con- 
sidered. The driving noise S can be Gaussian, Poisson, Levy, or any semimartingale. 
If the drift depends linearly on the state and the diffusion is state independent, the 
resulting equation defines a linear random vibration problem. Otherwise, we deal 
with a nonlinear random vibration problem. 

The presentation of essentials on discrete time linear systems is followed by 
heuristic and rigorous methods for solving linear and nonlinear random vibration 
problems. 


7.2.1 Discrete Time Linear Systems 

Let X(n),n = 0, 1, . . ., be an IR^- valued time series defined by the recurrence formula 


X(n + 1) = a(n)X(n) + b(n)W(n), . . .n = 0,1,..., 


(7.3) 
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where ain) and bin) denote id, d) and (d , d') deterministic matrices, and Win), 
n = 0, i , . . is an -valued, weakly stationary, and uncorrelated time series 
with mean vector /x w («) = E[W(n)\ and covariance matrix y H fn) = E[(W(n) — 
fj. w )(W in) — /x H ,)']. It is assumed that initial state X( 0 ) and the driving noise { Win)} 
are uncorrelated. 

We develop recurrence formulas for the mean pin) = E[X in)], covariance 
yin) = E[(X (n) — n(n))(X(n) — /x(«)) / ], and covariance function c(m,n ) = 
E[(X (m) — ii(m))iX in) — /x(h))'] of the system state. It is assumed that the mean 
/iq — /x( 0 ) = £[X( 0 )] and the covariance yo = y( 0 ) = F[( 2 f( 0 ) — /xo)(X( 0 ) — 
/xo) / ] of X(i)), the second moment properties of { W in ) } , and the matrices ain) and 
b(n) are known. 

Theorem 7.1 The mean, covariance, and covariance function of the state vector 
defined by (7.3) satisfy the equations 

pin + 1) = ain) pin) + bin) p w in) , 
yin + 1 ) = a(n)y(n)a(n)' + b(n)y w (n)b(n)' , and 
c(m, n ) = a(n, m)y(m), n > m (7.4) 

with initial conditions /z(0) = /x o, y(0) = yo, andc(m , m) — y(m), where a (n, m ) 
denotes the state transition matrix. 

Proof The mean equation results by averaging (7.3). Since X{n) = X in ) — // in) 
satisfies the finite difference equation X(n + 1) = a(n)X(n) + b(n)W(n) with 
W(n) = Win) — fM w (n), the second equation in (7.4) results from the recurrence 
formula for X by direct calculations using the absence of correlation between X(n) 
and W(n). For the covariance function, dm , n) = E[X(n)X(m)'] with 


n— 1 

X(n) = ain, m)Xim) + afn, k + \)bik)W ik) , n > m, 

k=m 


gives the third formula in (7.4), where a(n, m ) = niLm a (k) with the convention 
that ain,n) = 1 is the id, <r/)-identi ty matrix. ▲ 

Consider the special case of a time invariant system driven by a weakly stationary 
noise, that is, ain) = a, bin) = b, fi w in) = /x w , and y w in) = y w in (7.3). 
Under some conditions, Xin) becomes a weakly stationary process as n — ► oo. For 
example, the asymptotic mean // = /iin) of Xin) can be calculated from 

/x = (/ — a)~ l bix w provided that all eigenvalues of a are included in the open ball 
with unit radius centered at the origin of C. This follows from the first equality in 
(7.4) whose asymptotic form is /x = a/x + b/i w . Similar results can be obtained for 
the other moments of the state. 

Example 7.1 Suppose Xin) satisfies (7.3) with d = 2, d' = l, an — ^ 0, 022 = 

P 2 7 ^ 0, a \2 = flu = 0, and bin) is the unit vector. If \fk\ < 1, k = 1,2, then 
Xin) becomes weakly stationary as n — >■ 00 with means = limn->oo E[Xkin)] = 
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/W(l ~ Pk), k — 1,2, and covariances ykl = E[Xk(n)Xi(n )] = y w j (1 — 

PkPl),k = 1,2.0 

Proof The mean equation in (7.4) gives nk = Pkt^k + fi w as n oo, so that 
fjL k = ii w /(\ — ftk) under the stated assumptions. Similar calculations using the 
second equality in (7.4) give stationary state covariances. ▲ 

Note that the partial, second moment characterization of Win) in (7.3) is insuffi- 
cient to characterize X(n) beyond its second moment properties. The probability law 
of { W (n )} needs to be specified for this purpose. For example, if the random variables 
{ Win ) } are independent with characteristic function cpw„ (u) — E [exp(i u Win))] , the 
characteristic function <f> n (u) = E[exp(iuX(n))] of X(n) in (7.3) with <7=1 satisfies 
the recurrence formula 

<p n + t(w) = E[e iu(a{n)XW+b(n)W(n)} ] = <p n {a(n)u)(p Wn (b(n)u), u e M. 

The formula can be applied to calculate the characteristic function of X(n), n > I , 
starting from the characteristic function of the initial state X(O). Similar arguments 
can be used to develop recurrence formulas for joint characteristic functions. For 
example, the characteristic function <^,,,,,+ 2 ( 1 /, v) = E[exp{i(uX(n) + vX(n + 2)))] 
of ( X(n ), X(n + 2)) can be calculated from 

Vn,n+i(u, v) = <p n (u+a{n)a(n+\)v)<pyf(fl ) {a{n + \)b(n)v)<p' W ( n+ \){b(n + \)v) 

since X(n + 2) = a(n + 1 )ci(n)X(n) +a(n+ 1 )b(n)W(n) + bill + \)W(n + 1) and 
X(n), W(n), and W(n + 1 ) are independent random variables. 


7.2.2 Continuous Time Linear Systems 

The continuous time analog of (7.3) is 

X(t) = a(t)X(t) + b(t)W(t), t > 0, (7.5) 

where X(t ) is an Revalued stochastic process, W(t ) denotes an -valued white 
noise process with mean /u. w (f) and covariance function ^(frOs') — ii w {s)){W{t) — 
A t w( ? )) ] = Yw( t)S(s — f), y w (t) is a positive definite (d ' , d') matrix, and a(t ) and 
b(t) are (<7, d) and (<7, d') matrices. The initial state X(0) has mean /to and covariance 
yd. and is uncorrelated from the driving noise. 

The classical linear random vibration theory provides equations for the mean 
fi(t) = E[X{t )\ , covariance y(t) = E[X(t)X(t)'], and covariance function 
c(s. t) = £'[X(t)Z(j)'] of X(t ) defined by (7.5), where X(t) = X(t) — /x(t). The 
development of these equations is based on heuristic arguments, calculations, and 
definitions of white noise. For example, note that X(t) in (7.5) is not defined since 
Wit ) does not exist in the second moment sense. Formal calculations used in the 
classical theory of linear random vibration give 
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U(t) = a(t)fi(t) + b(t)/u, w (t ), 

y(t) = ci(t)y(t) + y(t)b{t)a(t)' + b(t)y w (t)b(t)' , and 
3 c(s, t) 

= a(t)c(s, t), t > s, (7.6) 

3 t 

with initial conditions /x (0) = /xo, y(0) = yo, and c(s, s ) = y(s) ([23], Sect. 6.2, 
and [20], Sect. 5.2.1). The derivation of these equations uses the representation 

X(t) — a(t, 0)Y(0) + [ a(t, s)b(s)W(s)ds, t> 0, (7.7) 

Jo 

where system transition matrix ait, s), t > s, is the solution of 
3 a(t, 5 ) 

— —a(t)a(t,s), t > s, (7.8) 

at 

with a(i, s) = I ([24], Sect. 1.3 and Theorem 1, p. 40). 

Example 7.2 LetX(f), t > 0, be the solution of (7.5) with d — cl' = 1, a{t) = — p, 
p > 0, b(t) = 1, y w (t) = y w , and /xo = 0. The differential equations 
/x(f) = — pp(t), y(t) = —2 py{t) + y w , and dc(s,t)/dt = — pc(s,t ) with 
/x(0) = 0, y( 0) = yo, and c(s,s) = y(^) given by (7.6) have the solu- 
tions /x(f) = 0, y(t) = (y 0 - Yw/ (2p))exp(— 2pf) + y vl ,/(2p), and c(s,t) = 
y(s)exp(—p(t — s)), t > s. Additional examples can be found in [20] (Chap. 5). O 

The heuristic approach of classical linear random vibration theory delivers dif- 
ferential equations for the second moment properties of X(t), but cannot be extended 
to find higher order properties of this process. This is a significant limitation since 
higher order statistics of X(t) are sensitive to noise type. For example, let X(t) be the 
solution of (7.5) with d = d' = 1, a{t) — —p, p > 0, b{t) = s/2p, and suppose 
Wit ) is a Gaussian or a Poisson white noise, so that X(t) is the solution of 

dX(t) — — p X(t) dt + y/2p dB{t) or 

dX(t) — —p X(t) dt + y/2p dC(t), (7.9) 

where Bit) is a Brownian motion, C(f) = Tjt denotes a compound Poisson 

process, Nit) is a homogeneous Poisson process with intensity X > 0, and { Y^} are 
iid random variables with finite variance. If E[Y \ ] = 0 and ).E[ Yf] = 1, then B(t) 
and C(f) are equal in the second moment sense so that the processes X(t) in (7.9) also 
have the same second moment properties. Yet, their higher order properties differ sig- 
nificantly. Figure 7.1 shows histograms of maxo<,<ioo{ X it)) for p = 1 constructed 
from 5000 independent samples of X(t). Although the processes X(t) in this figures 
are equal in the second moment sense, the histograms of maxo<;<ioo{2f (01 under 
Poisson white noise for various value of X and under Gaussian white noise differ sig- 
nificantly. The histograms of maxo< ; < ioo{ Y (0) under Poisson white noise approach 
the histogram of maxo<r< ioo{ X (f)} under Gaussian white noise as X increases since 
Cit) with XE[Yp] = 1 converges weakly to Bit) as 7. — > oc (Theorem 7.23). 
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Fig. 7.1 Histograms of 
maxfXi-cioof^nO)} and 
max 0 <j<ioo{2f(^)} for p = 1 
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The remainder of this section reformulates the linear random vibration problem 
in (7.5) by defining white noise processes as formal derivatives of Brownian or other 
processes and using Ito’s formula to construct differential equations for moments 
and other properties of X(t). Linear random vibration problems are defined by 

dX{t) = a{t)X{t)dt + b{t)dS(t), t > 0, (7.10) 

where S(t) can be an R' / -valued semimartingale. As previously, the initial state X(Q) 
is assumed to be independent of the noise. Our focus is on Gaussian and Poisson 
white noise processes as in (7.9), that is, S(t) is either B(t) or C(t). If S(t) = B(t), 
then X(t ) is a Gaussian process, so that its probability law is completely defined by 
its second moment properties. 

Theorem 7.2 IfS(t) = B(t) in (7 .10) is an -valued Brownian motion with inde- 
pendent coordinates, the mean vector n(t) — E[X(t )], covariance matrix y(t) = 
E[X{t)X(t)'~\, and covariance function c{s, t) = £’[A r (?)^C(5 , ) / ], t > s, of the 
solution X(t) of this equation can be obtained from (7.6) with b{t)b(t)' in place of 
b(t)y w (t)b(t)' , where X(t ) — X(t) — /x(f). 

Proof Ito’s formula for RA valued continuous semimartingales in Theorem 5.3 
applied to the mappings x i— >■ g(x) = x p and x i->- g{x) = x p x q gives by averaging 
equations for the mean and the covariance functions of X(t), where x p and x q are 
coordinates of x e K rf . For example, this formula applied to g(x) = x p x q gives 
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d 

Xp(t)X q (t) - X p ( 0)X q (0) = Y / (SipXgis) + X p ( S )S iq )dXi ( S ) 
,■=1 - 70 

2 ^ ^ d nt 

H" — ^ \ ' j {^ip^jq ”1" & jp&iq)bik(.s)b jk{.s)ds 


i,j=l k= 1 ’ 


Xq (s)^apj(s)Xj(s) + Xp(s) y' d a q j(s)Xj(s) 


j = i 


7 = 1 




+ \J [( b ^ b ^y)pq + (^O)Ms)') qp \ ds + martingale (f). 


The expectation of the left side of this equation is r pq (t) — r ;)£/ (0) , where r(t) = 
E[X (r)X (t)') denotes the correlation matrix of X(t). The right side has three terms. 
The terms of the form X q (s) clB r (s ) have zero expectation. The order of integra- 
tion can be changed in the first two terms on the right side of the equation by Fubini’s 
theorem since X(t) is measurable with respect to both arguments and has finite expec- 
tation. For example, the first term becomes X^t a pj ( s ) r qj (s)ds. Differentiation 
with respect to time gives 


d d 

>' pq (0 — ^ Cl pj (t)f'qj (t) + ^ (l q j (t )r pj (t) + 


(b(t)b( t y) pq + (bitm) ') 


7=1 7=1 

= { a (t)r(t)) pq + (r{t)a{t)') pq + (b(t)b(tY) pq , p, q = 1, . . ., d, 


qp 


which yields the second equality in (7.6) for the case in which y w (t) is the identity 
matrix. The matrix y w (t) is identity since B(t) has independent coordinates. 

The last equation in (7.6) results from the expectation of the integral form X (f ) — 
Z(j) = f' a(u)X(u)du + f' b(u)dB(u), t > s, of (7.10) following multiplication 
with X (.v ) , expectation, and differentiation with respect to time. We have 

a(u)X(u)du+ j b{u) dB(u) 

The left side of this equation is c(f, s) — c(s, s ). The right side is J' a(u)c(u, s)du 
by Fubini’s theorem, the independence between the random variables J' b(u)dB(u) 
and X (,v), and Z?[ [' b(u)dB(u)J = 0. The differentiation of the resulting equation 
gives the last formula in (7.6). ▲ 

Arguments used to prove Theorem 7.2 can be applied to derive equations for 
moments of order three and higher, characteristic functions, and other descriptors 
of X(t). These equations are only needed for systems driven by non-Gaussian noise 
since, generally, their states are non-Gaussian processes. 

Example 7.3 Let X(t), t > 0, satisfy the first equation in (7.9) with p > 0 
and Y(t) k dt in place of dB(t), where k > 1 is an integer and Y(t) is defined 


E[(X(t) - X(s))X(j)'] = E 
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by dY (?) = —aY(t)dt + y/2adB(t ), t > 0, with a > 0. The moments 

p (p,q; t ) = Zt[X(f) /: T(f) 9 ] of order p + q, p, q > 0, of (X(t), K (t j) satisfy 

the differential equation 

fi(p, q;t)=- ppp(p, q\ t) + pp(p - 1 ,q + k\ t) 

- qap(p, q; t) + aq(q - 1 )p(p, q - 2; t) (7.11) 


with the convention p(p, q\t) = 0 if p and/or q are negative. The initial condition 
p(p, q; 0) results from the properties of (X (0), Y (0)) , that need to be specified. 

The moment equations in (7.11) are closed so that they can be solved exactly. 
Suppose that our objective is to solve moment equations with p + q = m and that 
moment equations with p + q < m have been solved. There are m + 1 moment 
equations with p + q — m > 1 involving, in addition to m + 1 moments of the 
type p(p, q\ t ), the moments p(p — 1 ,q+k\t) and p(p, q — 2; t) which need to 
be obtained prior to solving these moment equations. The moments p,(p, q — 2; t) 
are available since they are solutions of moment equations with p + q < m. The 
moments pip — 1 ,q + k;t) are unknown, but can be obtained from (7.11). The 
moment equation for p = 1 and q = r is /i(l, r; t) = —pp( 1, r\ t) + p(0, r + 
k\ t) — arp( 1, r; t ) + ar(r — l)/x(l, r — 2; t), and can be solved sequentially for 
increasing r using the fact that p(0, r + k ; t) is known since Y(t) is Gaussian. These 
equations give p ( 1 , r ; t ) for any r > 1 . Similar calculations for p = 2 and increasing 
r deliver p{2, r; f), and so on. 

For example, there are two moment equations for m — 1 involving the moments 
p{ 1, 0; t), p( 0, 1; t), and p{ 0, k; t). Since p( 0, k; t) can be calculated from the 
defining equations for Y{t), it is possible to find p( \ , 0; t) and p (0, 1; t). There 
are three moment equations for m = 2 involving the moments p( 2, 0; t), p(\ , 1; t), 
p( 0, 2; r), p{ 1, k; f), and p( 0, k + 1; t). To solve these equations, we need only 
p( 1, k ; t) since p( 0, k + 1; t) is available. The sequence of equations for p = 1 and 
increasing q = r mentioned previously can be used to find p(l, k; t). Additional 
examples of linear systems driven by polynomials of filtered Gaussian and Poisson 
processes can be found in [4] (Examples 7.23, 7.26, and 7.28). O 

Proof Note that (X (f ) , Y (f )) is a bivariate diffusion process defined by the stochastic 
differential equation 


d 


'x(ty 


'-pX{t) + Y{tf 

Y(t )_ 


— aY ( t ) 



dB(t). 


The integral form of Ito’s formula in (5.15) applied to the mapping ( X (t), Y (t)) 
X(t) p Y(t) q gives 

X{t) p Y(t) q - X(0) p Y(0) q = [ P X(s) p ~ 1 Y(s) q dX(s)+ f qX(s) p Y(s) q ~ l dY(s) 

Jo Jo 

+ f q(q - 1 ) X(s) p Y(s) q - 2 (2a)ds. 

2 Jo 
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The expectation of the left side of this equation is p(p, q\ t) — p-ip, q\ 0). The first 
and second integrals on the right side of the equation are P Jq ( — pX(s) p Y(s) q 
+ X{s) p ^ l Y(s)‘ l+k )ds and qX{s) p Y(s)‘ i ~ l ( — uY(s)ds + \/2 adB(s)) by the 
definition of (X(t), Y(t)). Since (X(t), Y (f)) is measurable in both arguments and 
integrable, the expectation of, for example, the first integral is p Jj ( — pp(p, q\ s) + 
p( p — 1, q + k; s))ds by Fubini’s theorem. Similar arguments show that the expec- 
tation of the above expression is 

pip, q; t) - p{p, q;0) = p ( - pp(p, q\ s) + pip - 1 ,q + k; s))ds 

Jo 

— uq f p{p, q\ s)ds + J-J- -(2a) [ p(p, q — 2; s)ds. 

Jo 2 Jq 

Differentiation with respect to time t yields (7.1 1). A 

The formulas in (7.6) cannot be used to find the second moment properties for the 
stochastic process in Example 7.3 since the driving noise is not white. Two options 
are available. We can extend the formulas in (7.6) such that they apply to colored 
noise ([20], Sect. 5.2.1) or perform direct calculations by using properties of Gaussian 
variables and/or multiple Wiener-Ito integrals. These options are illustrated by the 
following example. 

Example 7.4 The stationary covariance function of X(t ) in Example 7.3 with k — 2 
has the expression 

c(t, s) = e - p(, ~ s) + , 2 , ( e - 2a 0- s > - e - p(, - s) ), t > s, (7.12) 

pip + 2a) p l — 4a z 

and can be obtained by an extension of the mean and covariance equations in (7.6) 
to the case in which the driving noise is colored or by direct calculations. O 

Proof Consider first an extension of (7.7). Let X{t ) be defined by (7.5) with Z(t) 
in place of W(t), where Z(f) is an -valued colored noise whose coordinates have 
finite variance, so that it can be calculated from (7.7) with W{t) replaced by Z(f). The 
centered process X(t) = X (t ) — E[X (f)] satisfies the equation 

X(t) = a(t, 0)X(0)+ [ u(t,s)b(s)Z(s)ds, t > 0, (7.13) 

Jo 

where Z(t ) = Z(t) — E[Z{t)]. Let yo — (0)^T (O)'] andc-(w, v) = E[Z(u)Z(v)'] 

denote the covariance of the initial state and the covariance function of Z. For t > s, 
we have 
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r ~( 


X(t)X(s)' = I a(t, 0)X(0) + / a{t,u)b{u)Z(u)du 




0)X(0) 


a(s, v)b(v)Z(v)dv 


)' 


= a(f,0)X(0)Z(0) , a(^,0y + a(f, 0)X(0) / Z(v)'b{v)'a(s, v)'dv 


a(t, u)b(u)Z(u)du ) X (0) , a(.v, 0/ 


a(t , u)b(u)Z(u)Z(v)'b(v)'a(s , v)'dudv. 


which gives 


c(t, s) = a(s, 0)yo a(f, 0/ + / / a(t,u)b{u)c z {u,v)'b{v)'a{s,v)'dudv 

Jo Jo 


by expectation, so that 

3 c(t, s) 
d t 


— a(t)c(t, s ) + d(t, s), 


where d(t, s) = b(t) I c z (t,v)b(y)'a{s,v)'dv, (7. 

Jo 


14) 


by differentiation with respect to t (Exercise 7.6). The latter equation can be solved 
for the initial condition y(s) = c(s, s ) that can be determined from 


y{t) = a(t)y(t ) + y(t)a(t)' + d(t, t) + d(t, t)' 


(7.15) 


with the initial condition y( 0) = yo. 

For the real-valued process Z(t') = Y (t) 2 considered in this example, we find 
c z (t,s ) = E[Y (t) 2 Y C?) 2 ] — E[Y {t) 2 ]E[Y (s) 2 ] = 2exp(— 2 a(t — s )) by using 
properties of Gaussian variables. The covariance function in (7.12) is obtained by 
solving the differential equations y(t) = —2 py(t) + 2 d(t,t) and dc(t,s)/dt = 
—pc(t, s) + d(t, s), where d(t, s) = 2(e~ la( ' t ~ s ' , ~ e ~ a ' ps )/(2 a + p). 

We now use direct calculations to find the expression of c(t, s). The integral form 
of the equation defining X(t) is 

X(u)du + J Y(u) k du , t > s, 

which gives r(t,s) — r(s,s) = — p f r r(u,s)du + f* E[Y(u) k X(s)]du by multi- 
plication with X(s) and taking the expectation, where r(t. s) = E[X (t)X (.v)] and 
k > 1 is an integer. This equation gives 


X{t)-X{s) = -p 


dr(t,s) l 

= —pr(t, s) + E[Y(t) k X(s)], t > s. 
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by differentiation with respect to t. For t > s, we have Y (t) = e al ^T(.j)e" s + 
\/2 oil(s, f)^j with I(s, t) = f ' e all dB(u), so that 


E[Y(t) k X(s )] = E 


ak, {Y(s)e- as 



k 

X(s) 


K If I 

= e~ ak ' ^ ' e a(k - q)s {2a) ql2 E[Y{s) k ~ q X{s)]E[I(s , t ) q ] 

q=0 q '-( k - q Y- 


since / (s , t ) is independent of X(s) and Y(s). The expectation E\ Y (s) k ~ q X (.v ) ] = 
H{\, k — q; s) has been calculated in Example 7.3. The variable I(s, t ) is Gaussian 
with mean 0 and variance J' r e 2alt du = (e 2at — e 2as )/{2 a) by Ito’s isome- 
try (Sect.4.4). We have £[T(t) 2 X(j)] = e~ 2clt [e 2as 2; s) + 2a/r(l, 0; s) 
E[I(s, t) 2 ]\, so that E[Y(t) 2 X(s)] = (/r(l, 2) - /x(l, 0))e- 2cl( '- s) + /x(l, 0) in the 
stationary regime, which yields (7.12) since c(f, .?) = r(t, s ) — £[X(r)]£ , [X(5)], 
/r(l, 0) = 1/p, and [i( 1, 2) = (21a + 3p)/[p(2a + p)]. A 

Example 7.5 The characteristic function <p(u,v; t) = £'[e , («W0+ v >'(0)] 0 f ( x(t ), 
Y ( t )) in the previous example satisfies the partial differential equation 


dcp 

~dt 


dip 

= -P u ^~ + (- 
du 


If/ 


k;k + 1 


3 k cp 

"TT 

dv K 


dip 2 

— av av (0 

dv 


with boundary conditions resulting from the equality of d p+q ip(u , v; t)/du p dv q at 
(u = 0, v = 0) and i p+q E[X(t) p Y(t) q ], The moments of (X(t), Y (t)) are from 
Example 7.3, and the initial condition results from properties of ( X (0), Y( 0)). O 

Proof Ito’s formula applied to exp (i(uX{t) + vY (/))) with (n, v) e R 2 gives the 
partial differential for <p(n, v; t) by averaging and differentiation with respect to time. 
Similar calculations have been performed in Example 5.15. A 


7.2.3 Continuous Time Nonlinear Systems 

Let X(t), t > 0, be an Kf -valued stochastic process satisfying the stochastic differ- 
ential and integral equations 

dX{t) — a(X(t—)) dt + b(X(t—)) dY(t) and 

a(X(s—))ds+ I b(X(s—))dY(s), t > 0, (7.16) 

Jo 

where a and b are (d, 1) and (d. d') matrices whose entries are real-valued Borel 
measurable functions and Y(t) denotes an Iff -valued semimartingale (Sect. 5.5.1). 
It is assumed that the drift and diffusion coefficients a and b are such that the solution 
of (7.16) exists and is unique (Theorems 5.7, 5.8, 5.11, and 5.12). 


X(f) = X(0) + 
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We develop equations for moments, distributions, and other properties of X(t ) by 
Ito’s formula for continuous and arbitrary semimartingale in Theorems 5.3 and 5.4. 
The differential equation for the density of X{t) is referred to as the Fokker-Planck 
equation. 

Let/ti(t/i, . . ., qd\ t ) = -EfniLi ^k(t) qk ] be moments of order q = q\-\ \-qd, 

where q k > 0 are integers, q>(u\ t) = E[exp(iu'X(t))], u = (u i, . . ., uj) e 
be the marginal characteristic function, and f(x; t | xo; 0) = e~ lu x <p(u\ t)du/ 
{2jT) d , x — {x\, . . ., Xd) e R' 3f , be the marginal density of X(t) at time t > 0 for a 
specified initial state X(0) = xq assumed to be independent of the driving noise. 

Theorem 7.3 The moments and the characteristic functions ofX(t) in (7.16) driven 
by an -valued Brownian motion Y ( t ) = B(t) satisfy the differential equations 


Kq 1 dd\ 0 = 


;0 = Z4 

k=\ L 


a k (X(t)) 


a g (W(Q) ~| 

3 x k \ 


i X. - 


3 <p(w, t) 

ft 


d i d 

= i £ u k E[a k (X(t))e iu ' X(t) ] - - ^ u k u t E 


kj=l 


t))b{X{t))‘ 


where g(X(t)) = flLi X k (t) qk . 


(7.17) 


Proof The expectations of Ito’s formula applied to g(X(t)) and exp (iu'X(t)) give 
(7.17) by differentiation with respect to time. For example, Ito’s formula in (5.15) 
applied to exp (iu'X ( t )) gives 


iu ' x (? ) - e iu ' X(0) = y i [' u k e iu ' x(s) dX k (s) 
Jo 


l X [‘ u k uie iu ' X(s) d[X k , X,](5), 


which yields the second formula in (7.17) by expectation and differentiation with 
respect to time since dX k {t) = a k (X(s))ds + y d _ x b kr (X(s))dB r (s) and 
d[X k , X/](.s) = Y? r = 1 b kr (X(s))bir(X(t))ds = {b(X(s))b(X(s)Y) u ds. 

Note that (7.17) delivers differential equations for the moments and the char- 
acteristic function of X(t) only if the drift and diffusion coefficients in (7.16) are 
polynomials of X(t). In this case, the expectations in (7.17) can be expressed as 
moments of X(t) and partial derivatives of (p(u: t ). A 

Suppose the drift and diffusion coefficients are polynomials of X(t). The solution 
of the ordinary differential equation for /i(q i, . . ., qd\ t) only requires initial values 
H(q i, . . ., qd\ 0) that can be calculated from the initial state ATO), which is known. 
Generally, it is not be possible to find the moments of X(t) exactly since the moment 
equations in (7.17) form an infinite hierarchy, as illustrated by a subsequent exam- 
ple. The solution of the partial differential equation for tp(u; t) requires initial and 
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boundary conditions. Initial conditions result from properties of X(0). The bound- 
ary condition </;((): t) = 1 follows from the definition of the characteristic function. 
Additional boundary conditions needed for solution result from the following fact. 
If /z : R — » M is integrable, then £(m) = f^e ,ux h(x)dx — > 0 as \u\ — > oo ([25], 
Lemma 3, p. 513). This implies ip(u; t) — > 0 as ||u|| — > oo if X(t) has a density and 
3 q <p(u\ t)/[ 3mj‘ ■ • • 3 Uj d ] — > 0 as ||n|| — > oo if X(t) has finite moments of order 
q = qi + ■ ■ ■ + qd- Since ||n|| — > oo holds if at least one of the coordinates of u 
converges to infinity, say u ( j , and 


<p(u\ t ) = 


e^U W/(x) 


e lUiX d f( Xd I X )dx d 


dx, 


we have e lUdXd f(x d \ x)dx d — > 0 as \u d \ -» oo since f(x d \ x) is integrable, 
where x = (x\, . . x d -i), f(x) denotes the density of X = (Xi, . . ., X d ~\), and 
/(• | x) is the density of X d \ (X = x). 

Example 7.6 Let X{t) be the solution of dX(t) = 3X(t ) — Z(f) 3 )r/t + adB(t), 
t > 0, with initial state Z(0) independent of B(t), where jS, a are real constants. The 
moments /x( q\ t) — E[X(t) q ] satisfy the ordinary differential equation 

q(q — l)a 2 

p(q\ t) = qfip,(q; t) - qp,(q + 2; t) H p(q - 2; t), t > 0 

with p. (q ; 0) = E[X (O)^]. It is not possible to find the moments of X(t) exactly since 
the moment equations for X(t) form an infinite hierarchy. For example, fi(\: 1) = 
/S/x(l; t) — p.(3; t ) depends on /i(3; t), which is not known. Closure methods have 
been proposed to solve approximately moment equations for nonlinear systems. The 
methods are heuristic, and can be unsatisfactory [26, 27]. O 

Example 7.7 The stationary density of X(t) in the previous example is an even 
function, so that all stationary odd order moments are zero. The non-zero stationary 
moments p.(q) — lim,-^ p.(q\ t) satisfy the recurrence formula 

p{2 (k + 1)) = ap,(2k) + (2k - \)bp.(2(k - 1)), *=1,2,.. ., 

where a — ft and b = a 1 12. This formula gives /x(4) = ap(2) + b, /r(6) = 
(a 2 + 3b)p,(2) + ab , p.( 6) = (a 3 + 8ab)p.(2 ) + a 2 b + 5b 2 . and so on. Closure 
methods retain a finite number of moment equations and supplement them with 
additional relations between the unknown moments in these equations, so that the 
augmented set of equations can be solved. For example, let ko be an arbitrary closure 
level and p,(2(ko + 1)) = t;p.(2ko) a closure technique, where £ > 0 is a constant. 

For p = — 1 and a = 1, /x( 2) takes values in h = [0, 1/2] for *o = 2 and 
/ 4 = [0, 1 /2] for * 0 = 4. It can be shown that lim£ 0 _ >oo Lt 0 = [p.( 2)} irrespective of 
the value of t; [27]. For p — 1 and a = 1 , p.( 2) takes values in (0, oo) irrespective of 
the closure level and the value of /. This shows that the particular closure technique 
used for solution is irrelevant. The structure of the stochastic differential equation 
determines the success or the failure of closure techniques [26, 27], O 
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Example 7.8 The characteristic function ip(u;t) = E[exp(iuX(t ))] of X{t) in 
Example 7.6 satisfies the partial differential equation 


3 <p(u; t) 3 <p(u; t ) 

= pu- 


d 3 ip(u; t) a~u 


3 1 


du 


3 w 3 


+ 


ip{u\ t ), MG 


with the initial condition ip(u; 0) = E[sxp(iu X (0))]. The boundary conditions 
can be ip(0; t) = 1 , lim| u |_ >00 ip(u\ t) = 0, and lim| H |_ j . 00 dcp(u; t)/du — 0. 
Alternative boundary conditions are available in this case since X(t) = — X (t) 
so that the density of X(t) is an even function implying E[X (t)] = 0 provided 
it exists, ip(u\ t) — (p(—u\ t ), and ip(u\ t) e K. Hence, it is sufficient to solve 
the partial differential equation for ip(u', t) in [0, oo) with the boundary conditions 
^(0; t) = 1, 3 ip(u; t)/du = 0 at u = 0, and lim| t ,|_ >00 ip(u; t ) = 0. The latter con- 
ditions can be replaced for numerical calculations with <p(a; 1) = 0 for a sufficiently 
large a > 0. <> 

In summary, differential equations can be obtained for the moments and character- 
istic function of X(t) defined by (7.16) provided the entries of the drift and diffusion 
coefficients are polynomials of Z(t). Generally, the moment equations cannot be 
solved exactly since they form an infinite hierarchy, a very different situation from 
that of moment equations for the state of linear systems driven by polynomials of 
filtered Gaussian and/or Poisson processes (Example 7.3). The solution of the partial 
differential equation for the characteristic function requires both initial and bound- 
ary conditions. The specification of boundary conditions beyond <p( 0; t) = 1 may 
pose difficulties since they involve properties of X(t), that are not known. Numerical 
solutions are usually obtained under the assumption that the support of ip{u\ t) in the 
argument u is a bounded rectangle R in R^. Boundary conditions on ip(u; t) and its 
derivatives for ||w|| -» oo are usually imposed on the boundary of R. For the process 
in Example 7.8 this means to replace limiui^oo ^(m; t ) = 0with^>(t/; t) = 0, u > 0. 
Solutions for increasing value of u need to be performed to determine whether the 
assumption that X{t) has finite moments is valid and/or R = (— if, u) is sufficiently 
large. Stable solutions for increasing values of u would indicate that the moments of 
X{t) considered in the analysis are finite and that R — (— n, u ) has an adequate size. 

The following theorem shows that the Fourier transform of the second differential 
equation in (7.17) is a partial differential equation for the density f{x\ t \ xo; 0) of 
X{t) | (X (0) = xo), referred to as the Fokker-Planck equation. The entries of the 
drift and diffusion coefficients are not required to be polynomials of X(t). 

Theorem 7.4 If the conditions a^ix) f (x; t \ xo; 0) —*■ 0, (b(x)b(x)' )ki f (x; t \ 
xo; 0) — » 0, and d[(b(x)b(x)')kif(x; t | xo;0)]/3x* — ► 0, k,l = 1, . . d, as 
||x|| — »■ oo hold, and if /(x; t \ xo; 0) and 3/(x; t \ xo; 0)/3 1 are continuous, then 
f(x\ t | xo; 0) satisfies the partial differential equation 


Bf 
3 1 


d 3 i d 

2]— [a k (x)f]+ - X ^mx)b(x)') k if] 


2 dx k dxi 


(7.18) 
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with the initial condition f(x ; 0 | xo; 0) = 8(x — xo) and boundary conditions 
depending on the objective of the analysis, that are discussed later in this section . 

Proof The first three conditions impose constraints on the tails of /(x; t | xo; 0), 
and relate to the behavior of the characteristic function cp(u; t ) of Xit) around u = 0. 
We show that the Fourier transform of (7.18) coincides with the second formula in 
(7.17). 

The integral of the left side of (7.18) multiplied by exp(iVx) is 


d/C*; 1 1 x 0 ; p) dx _ d_ r 
3 1 3 1 J^d 


e lux f(x; t | x 0 ; 0 )dx = 


d(p(u; t) 
8t ’ 


by using Leibnitz’s rule in R x [0, r], that applies since e ,u x dfjdt and e ,u x f are 
continuous in x [0, r ] , where R is an arbitrary but bounded rectangle in , [0, r ] 
denotes a bounded range for t, and/is a short hand notation for /(x; t | xo; 0). 

The Fourier transform of the first term on the right side of (7.18) is 


_r ^B^n dx= _ 


3 xk 


[ e iu ' x 'dx r 


r=l,rjtk 


3 far/) 

3 Xk 


dxk 


= iuk / e ,u X akfdx = iukE\e ,u x ^ak(X(t))] 

J 


since J R e ,utXk d(akf)/dxkdxk = e lUkXk a k f |(’° oc , - J R iu k e lu ' x a k fdx k by integra- 
tion by parts and e lUkXk akf 1^00= 0 by assumption. Similar calculations show that 
the integral of the second term on the right side of (7.18) multiplied by exp {iu'x) 
over coincides with the second term on the right side of (7.17). A 

An alternative form of (7.18) is 


3/ 

3r 


z 


dX k (x\ t ) 
3 x k 


where 


1 d 

X k (x; t ) = a k f - - y. 


3 {(bb'hif) 
3 x/ 


(7.19) 


The vector L(x; t) e with coordinates {^^(x; t)} is referred to as probability 
current. Let pd(0 = f D /(x; t \ xo; 0 )dx, where D is on open subset of with 
boundary 3 D. Then 


3 PD(t) 
3 1 


= [ - — — ' ^ dx = — [ A.(x; t) ■ n(x) do (x) 
3 Xk JsD 


(7.20) 


by the divergence theorem ([28], p. 116), where n(x) denotes the exterior normal at 
x e 3 D and do(x) is an infinitesimal surface element on 3 D. Note that — A.(x; 1) ■ 
n (x)dcr(x) represents the rate of change of pp(t) caused by probability flow through 
the element of area do(x) of 3Z). 

Definition 7.2 Let T = inf [r > 0, X (t) f D, X (0) = xq e D\ be a stopping time 
denoting the first time X(t) starting at xq e D exists D. If T = oo a.s., 3D is an 


7.2 Stochastic Equations with Deterministic Coefficients 


253 


Fig. 7.2 Analytical (dotted 
line) and finite difference 
(solid line) solutions for 
PD(t ) 
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inaccessible boundary, that is called natural if X(t) never reaches it and attracting 
if lini/^oo X(t) e 3 D. If T < oo a.s, 3 D is an accessible boundary, that is called 
reflecting if A.(x; t) ■ n(x) = 0, x e 3 D and absorbing if f (x ; t | xo; 0) = 0, x e 
3 D. 

Example 7.9 The solution X(t) of the stochastic differential equation dX(t) = 
cX(t)dt + aX (t)dB(t), t > 0, referred to as the geometric Brownian motion, 
isA'(f) = xo exp[(c — a 2 /2)t + a<7Z?(f)], where X(0) = xo denotes the initial state 
(Example 5.11). We have lim f _ ) . 0 o X ( t ) = X(0) exp[(c — cr 2 /2)t] since B(t)/t — > 0 
a.s., as t — >■ oo ([29], Sect. 6.4). If X(0) ^0 and c — er 2 / 2 < 0, then x = 0 is an 
inaccessible, attracting boundary. <> 

Example 7.10 The density /(x; t) of a Brownian motion B(t), t > 0, starting at 
B( 0) = 0 satisfies the Fokker-Planck equation 3/(x; t)/dt = (1/2)3 2 /(x; t)/ 3x 2 
with the initial condition /(x; 0) = <5(x). Let 7 = inf {f > 0 : B(t) > a }, a > 0, 
be the first time B(t ) exists D = (— oo, a). The probability that Bit) does not exist D 
in a time interval [0,r] is pr>(t) = P(T > t) — 2&(a/^ft) — 1. This probability can 
also be obtained by solving the Fokker-Planck equation for /(x; t) numerically with 
the absorbing boundary /(a; t) = 0. Figure 7.2 shows with dotted and solid lines 
the analytical expression of po(t) and a finite difference solution of 3/(x; t)/dt = 
(1/2)3 2 /(x; t)/ 3x 2 under appropriate boundary conditions. O 

Proof We have P(B(t) > a) = P(B(t) > a \ T < t)P(T < t) + P(B(t) > a \ 
T > t)P(T > t) = P(B(t) > a \ T < t)P(T < t ) since P(B(t) > a \ T > 
0 = 0. If T < t. the Brownian motion has reached x = a at a time prior to t. The 
symmetry of the Brownian motion implies that the events {B(t) > a \ T < t] and 
{ B(t ) < a | T < t} are equally likely, so that P(B(t) > a) = P(T < t)/2 or 
P(T < t) = 2 P(B(t) >a) = 2<Z>(-a/V7). A 
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Example 7.11 The partial differential equations for the characteristic and density 
functions of X(t) defined by dX(t) = —pX(t)dt + dB(t) and dX(t) = —pX(t)dt + 
dC(t) are 


Gaussian noise: 

3 <p 3 <p u 2 


ttt = -p u ~t 

3 t OU 

Poisson noise: 


dip 

~dt 


= ~P U Jf +^{<PYi(u) - 1 )<P 


9 / 

3 t 


9 / 
3 1 


3(xf) 13 2 / 

P dx 2 dx 2 


3(xf) 
p dx 




{-\)'E[Y[] d r f 
r\ dx r ’ 


where p > 0 is a constant, B(t) denotes a Brownian motion, C(t ) — X/fiV N(t) 
is a Poisson process with intensity X > 0, and { T* } are iid random variables with 
bounded moments of any order, and ipy l is the characteristic function of Y\ (Exercise 
7.9). O 

Example 7.12 Suppose the real-valued diffusion process X(t) defined by dX(t) = 
a(X(t))dt + b(X(t))dB(t) admits a stationary solution. The stationary density of 
X(f)is f(s ) oc exp[2/l(jr)]//7(.v) 2 , where fi'(x) = a(x)/b(x) 2 . Additional examples 
can be found in [30] (Chap. 5) and [4] (Sect. 7.3.1). O 

Proof The stationary version of (7.18) is d[a(x)f(x) + d(b{x) 2 f(x))/dx]/dx — 0 
so that a(x)f(x) + d(b(x) 2 f(x))/dx is a constant that must be zero under the 
assumptions in Theorem 7.4. The solution of a(x)f(x ) + d(b(x) 2 f(x))/dx = 0 is 
the stationary density of X(t). For example, f(x) oc exp[(ax 2 +/3.r 4 /2)/t7 2 ], x e R, 
for a(x) = ax + fix 2 and h(x) = a is a constant. There is no stationary density for 
a > 0 and fi > 0. If a > 0 and fi < 0, the stationary density / (x ) exists and has 
two modes. ▲ 


The dimension of X(t) is large in most applications, so that it is difficult to find its 
properties by solving the differential equations in (7.17) and (7.18). A broad range 
of approximate methods has been proposed to characterize X(t), for example, pertur- 
bation, Taylor series, Neumann series, equivalent linearization, stochastic averaging, 
and other methods ([30], Chap. 6, [4], Sect. 4.9.4, [31]). 


7.3 Stochastic Difference Equations with Random Coefficients 

Difference equations can be obtained from stochastic differential equations with ran- 
dom coefficients by time discretization or can be constructed directly as discrete time 
models for various phenomena ([32], Sect. 1.1). Following some general consider- 
ations (Sect. 7.3.1), we discuss methods for solving stochastic difference equations 
with arbitrary random coefficients (Sects. 7. 3. 2-7. 3. 5) and random coefficients with 
small uncertainty (Sects. 7. 3. 6-7. 3. 7). 


7.3 Stochastic Difference Equations with Random Coefficients 


255 


7.3.1 General Considerations 


Let 



be a discrete time version of (7.1) obtained by approximating the differentials in 
this equation by forward finite differences, where At > 0 denotes the time step, 
X n = X(nAt), Y n = Y{n At), S(t) = B(t), and W„ = B{(n + 1 )At) - B(nAt) is 
an d' -dimensional vector with independent N( 0, At) coordinates. It is assumed that 
the drift and the diffusion coefficients in (7.1) are nearly constant in At. The time 
series {X,,} conditional on {To, Ti, . . .} is a discrete time, continuous state Markov 
process whose transition probabilities can be obtained from the observation that 
X ll+ i | X n is a Gaussian vector with mean X n + a ( X n , T„, t)At and covariance 
matrix b(X n , Y n , nAt)E[W n W' n ]b(X n , Y n , nAt)' . 

The recurrence formula in (7.21) written in the form 


X n+l = A„(X„) + B n {X n )W n , n = 0, 1,2,..., 


(7.22) 


defines an autoregressive model with random coefficients, where A n (X n ) = X n + 
a(X n , Y„,nAt)At and B n (X n ) = b(X n , Y n , nAt). The model in (7.22) is said to be 
linear with multiplicative noise if A n (X n ) and B n ( X n ) are linear in X n . and linear 
with additive noise if A n (X„) is linear in X n and B„(X n ) does not depend on the 
state. 

Example 7.13 Let X(t) be a real-valued process satisfying the linear equation 

dX(t) = a(Y(t), t)X(t)dt + b(Y(t), t)X(t)dB(t), t > 0. (7.23) 

If a(Y(t), t) and b(Y(t), t) are replaced by deterministic constants, then X(t) is a 
geometric Brownian motion. If a(Y (f), t) and b(Y ( t ), t)X ( t ) are replaced by deter- 
ministic constants, then X(t) is an Ornstein-Uhlenbeck process. The discrete time 
version of (7.23) is 


X n + 1 = A„X„ + B n X n W n , n = 0, 1, . . ., 


(7.24) 


where A n and B n are random matrices depending on Y(t). <> 

Most studies on difference equations with random coefficients are limited to linear 
models with additive noise and state independent coefficients. Autoregressive models 
with random coefficients and state X n e R defined by 


X n +i = A n X n + W n , n = 0,1,..., 


(7.25) 


have been studied extensively, where {A,,} denotes a real-valued random sequence 
that is independent of {W n } and Xo, and { W „ ) is an iid sequence. For example, 
conditions for the existence of the stationary solution and other asymptotic properties 
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of {X,,} in (7.25) are established in [33]. Moment equations can be found in [12] for 
the vector version of (7.25) without driving noise under the assumption that [A„] is a 
Markov chain with a finite number of states. In this case, the augmented state ( X n , A„ ) 
is Markov chain whose evolution in time is defined by a nonlinear recurrence formula 
driven by Gaussian white noise, so that it is not Gaussian. Properties of the state 
of first order autoregressive models with random coefficients whose distributions 
depend on uncertain parameters are examined in [13, 34, 35] within a Bayesian 
framework. The reminder of this section summarizes some of the results in these 
references and presents some related facts. 

Theorem 7.5 If{A n } in (7.25) are iid random variables with |Ao| < 1 a.s. that are 
independent of the iid sequence {W,,} with mean 0 and variance 1, and {A,,} and 
{W n } are independent of Xq, then 


p n =E[X n ] = ( E[A 0 ]) Vo -»• 0, n -+ oo 

77 — 1 

Yn =Var[X„] = (E[A 2 0 ]) n Y o + ^ ( E i A W 


^Var[A 0 ]X( £ [ A o]) 2( " ?— 

n l- t[ A 


c(p, q) =Co v[X p ,X q ] = (£[A 0 ]) lp ^Yphq 


(E[A 0 ]) 


\p~q\ 


1 - E[A£] 

so that {X n } becomes weakly stationary as time increases indefinitely. 


0J 

p, q —*■ oo, 

(7.26) 


Proof The expectation of (7. 25) gives p, n +\ = E[X n+ \] = E[A n X n ] = EYA^EIX,,] 
— E[Ao]p. n since X„ is a function of (A„_i, . . ., Ao, W n -i, . . ., Wo) and E[W n \ = 
0. Repeated applications of the recurrence formula p n +\ = E[Ao]n n gives the 
expression of p. n . We have lim^oo p n = 0 since |Ao| < 1 a.s., so that £[Ao] < 1. 

Subtract ji n + 1 = E[Ao]p. n from (7.25) to obtain X n +\ — A n X n + A„/x„ + W„, 
where X n = X n — E[X n ] and A„ = A„ — E[A n ], The expectation of the square 
of this equation gives the recurrence formula Yn+i = E[A^\Yn + Var[Ao]/r,^ + 1 
since E[W 2 ] = 1, E[A n A n X n ] = 0, E[A n X n W n \ = 0, and E[A n W„] = 0 
by assumptions. Repeated applications of the recurrence formula for y„ give the 
expression of the variance of X n in (7.26). The first term in the expression of y n 
converges to 0 as n — > oo since E[A^\ < 1. The second term is a geometric 
progression with sum 1/(1 — £[Ag]) as n —> oo. Let S „ _ i denote the summation in 
the third term. Since 0 < S n - 1 < (^[^o]) ( and XI'- n (£[^o]) ! is convergent, 

lim^-^oo S n = S exists and is finite. We also have S n = (Zs[Aq])” + (LtAo]) - ^-] 
implying 5 = 0+ (7s[Ao])~5 by taking the limit n — > oo, so that S = 0 since 
£[A 0 ]# 0. _ 

Multiply X n+ \ = A n X n + A n p. n + W n by X n . The expectation of the resulting 
expression is c{n, n + 1) = E[X n+ \ X„] = 77[A()]y„, that is, a formula for the lag 1 
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Fig. 7.3 Estimates of mean 1 .4 

and variance functions of X n 
in solid and dotted line for k £ 1 -2 

(P', p") = (-0.2, 0.9) £ 
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covariance function at times n and n + 1 . In a similar manner we find 
c(p,q) = E[X p X q ] = (£[Ao]) lp “ 9l ypA 9 , 
which converges to the stated result as p, q — > oo. A 

Example 7.14 Suppose A„ in (7.25) is uniformly distributed in (p\ p"), — 1 < p' < 
p" < 1, and W„ ~ N (0. 1). Figure 7.3 shows with solid and dotted lines estimates 
of the mean pt n and the variance y n as a function of n for (p f , p ") = (—0.2, 0.9) 
based on 100000 independent samples of X n . The asymptotic values of the mean and 
variance in (7.26) are lim„_ ! , 00 gt n = 0 and lim,,.^ y n = l/(l — £[A^]) = 1.2875, 
in agreement with Monte Carlo estimates. In this illustration, p„ and y„ converge 
rapidly to their stationary values. O 

Proof The second moment of Ao is £[Aq] = JJ, u 2 du/(p" — p') — ({p') 2 + p' p” + 
(p") 3 )/ 3 = 0.2233 for p' = -0.2 and p" = 0.9, so that lim,,^ y n = 1.2875. If 
A„ is deterministic, for example, p' — p" — 0.9, the asymptotic mean and variance 
of X n are lim,,-^ /i„ — 0 and y n = 1/ (l — 0.9 2 ) = 5.26. This observation 

suggest that the variance of X n is sensitive to the uncertainty in A„ . ▲ 

Theorem 7.6 Let [X n , n = 0, 1, . . .} be an IS/ 1 -valued random sequence defined by 
= A n X n , where A n are iid (d, d) random matrices. If the sequence { A„ } is 
independent of initial state Xq, the second moment properties of this sequence can 
be calculated from 

E[X n+1 ] = E[A n ]E[X n ], 

E[X n+l X' n+l ] = E{A n E[X n X' n ]A' n }, and 
E[X n X' n+p ] = E[X„X' n \E[A' n ■ ■ ■ A' n+p _ x ] (7.27) 

for n = 0,1,... and arbitrary integer p > 1 . 
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Proof The expectation of X n+ \ = A n X n is E[X n+ \] = E[A n X n \ — £.’[/!„] E[X„] 

since X n isafunctionof (A„_ i, . . Ao, Xq) and A„ is independent of (A„_i Ao, 

Xq) by assumption. Similar considerations and properties of conditional expectation 
give E[X n+l X' n+1 ] = E{E[A n X n (A n X n y | A„]} = E{A n E[X n X’ n }A' n }. Since 
X-n+p = A n -\-p—\ • • • A n X n , we have X n X n , ^ ] = A \ X n ( A \ p \ • • • A n X n ) ], 
which gives the last formula in (7.27) by properties of X„. The formulas in (7.27) 
have been derived in [12] following a different approach. ▲ 

Example 7.15 Suppose A„ in Theorem 7.6 is a stationary Markov chain with states 
the (d, d) deterministic matrices [czi, . . ., a m ], transition probabilities p s k = P(A n = 
ak | An - 1 = a s ), and stationary probability 71 * = P(A n = ak), k = 1, . . m. The 
state expectation is E[X n +i\ = Ylk= 1 a k^k Z”Li E l x n I A n - 1 = fl s ]Ws- O 

Proof The probability of the event [A„ = a/,, A„_i = } is p s k^s- The expectation 

of X„+i = A n X n conditional on this event is E[X n . |_i | A„ = ak, A„_ 1 = = 
a k /.' | X pi | A^ — r7/ v -, ^4 ^ — 1 — tz^] so that 


£[^H+l] = I = a k' A n — 1 = 

5=1 

m mm 

= 52 I An— 1 = = 52 a k n k 52 I An-1 = 

k,s= 1 ^=1 5-= 1 

where we used p s kTC s = Pks^k and the fact that X n does not depend on A„. ▲ 

The random coefficients in (7.21) may model, for example, the degradation of a 
system’s properties in time under exposure to random actions or simply our limited 
knowledge/information on the system behavior. In the later case, both the functional 
form of (7.21) and the properties of the random coefficients in this equation are 
uncertain. Prior information on both the functional form of (7.21) and features of 
its random coefficients can be incorporated within a Bayesian framework to identify 
an optimal functional form for (7.21) and corresponding distributions of the random 
parameters of this form. Let {X n } be a real- valued autoregressive Gaussian sequence 
of unknown order k > 1, and consider a collection .-W.k, k = 1.2, . ... m, of such 
sequences whose members are defined by 

X n +1 = fio + f\X n + • • • + fk X n-k+\ + W n , (7.28) 

where — (P 0 , fiu ■ ■ -PkY are unknown coefficients, k > 1 is an integer, and { W n } 
are independent N (0, 1 / hk) with unknown variance hk > 0. Our objectives are to 
identify the optimal model for {X,,} and find distributions of its uncertain parame- 
ters based on (1) observations consisting of q > k consecutive, error free readings 
(zi, . . Zq) of {X,,} and (2) prior information consisting of prior densities f^(Pk, hk) 
of the uncertain parameters Pk and hk in (7.28) and the prior probabilities p' k > 0 
of the models .J^k, k = 1, . . ., m, satisfying the conditions p' k >0, k = 1, . . ., in, 
and XiT=i Pk = T The densities of (Pk, hk) that account for both prior information 
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and observations are denoted by f^ifk, hk) and are called posterior densities. Cor- 
responding model probabilities are denoted by p" k [36, 37]. An extensive discussion 
on the selection of prior densities can be found in [38] (Chaps. II and III). Physics 
and other knowledge can be used to select a set ./ffk, k = 1,2,..., m of competing 
models and their prior probabilities. 

The regression form of ./ffk for the observation vector (zi , .... z q ) is 


Vk — Clk fik T 


1 

Vh~k 


G k , 


where 


Ok 


Ut • • • zt 

1 Zk+l ... Z2 


1 Zn—1 ■ ■ . Zn—k 


(7.29) 


(7.30) 


Vk = (Xk+ 1 , . . ., X n )', and Gk is an in — k) -dimensional vector with independent 
N( 0,1) entries. 

Theorem 7.7 If fi is a normal-gamma density with parameters (f k , y k , p k , v' k ) and 
n > k, then the posterior density f!' of (/3k, hk) is also a normal-gamma density 
with parameters (/3' k , yj! , p k , v f !) given by 

( Yk ) _1 = (Yk)~ l + a k a k 

K = Yk[(Yk)- l K + 4zk] 

pH = p’H + (Pk? (Ylr'P’k - (PHH'r'Pk +z T H 

v k = v' k + n — k, (7.31) 


where Zk = (Zk+ \ , • • z„) is a column vector. 

Proof Under our assumption on fl(fk, hk), we have fk I hk ~ N(p' k , y k / hk) is a 
Gaussian vector with mean P' k and covariance matrix y k / hk and hk ~ G2(p k , v' k ) is 
gamma-2 distributed with parameters (p k , v ' k ) , so that ([39], p. 226) 


/**«) 


r(i4/2) 


^ v k / 2 Pk v kk/ 2 _ 


Direct calculations using the Bayes formula give the result in (7.31) ([4], Sect. 9.8.2, 
[33], Sect. 3.2.3). The density // is said to be a conjugate prior since it has the same 
functional form as //' . A 

Arguments as in the above theorem can be used to find the posterior model proba- 
bilities { p'l } . These probabilities can be used to identify optimal models [36,37]. The 
random coefficients of an optimal model can be characterized by posterior densities 
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of the type in (7.31). The solution of the resulting difference equation with random 
coefficients can be obtained by the methods outlined in the following two sections. 

Some of the methods for solving difference equations with deterministic coeffi- 
cients and random input can be extended to solve a class of difference equations with 
random coefficients and input. Alternative methods have also been proposed to solve 
approximately this class of difference equations. The methods in Sects. 7. 3. 2-7. 3. 5 
on Monte Carlo, conditional analysis, stochastic reduced order models, stochastic 
Galerkin, and stochastic collocation apply to difference equations with coefficients 
of arbitrary uncertainty. The Taylor and perturbation series methods in Sects. 7.3.6- 
7.3.7 are difference equations with coefficients that have small uncertainty. 


7.3.2 Monte Carlo Simulation 

As previously stated, Monte Carlo simulation is the only method that can be used 
to solve stochastic equations regardless of their size, structure, and complexity. The 
method involves the following three steps. First, independent samples of the random 
coefficients and driving noise need to be generated. Second, corresponding samples 
of the state vector need to be calculated. Third, resulting state samples are used to 
calculate state statistics. 

Example 7.16 Consider a mathematical pendulum with variable random length L(t), 
and let X(t) be the angle measured anticlockwise between the vertical line and the 
thread holding the pendulum mass. Then X(t ) is the solution of the differential equa- 
tion 


X(t) + Y(t)X(t) = 0, t > 0, (7.32) 


where Y (1) — g/L(t), g is the gravitational constant, Y it) = a + (b—a)0(Z(t)), 0 
denotes the distribution of N( 0, 1), and Z(t) is a stationary Ornstein-Uhlenbeck 
process with mean 0 and variance 1, that can be approximated by Z n+ \ = pZ n + 
v/1 - P 2 W n , \p\ 1, where 7 , , , is an approximation for Z. ( ti /\ t ) , /\ t 0 denotes 
the time step, and W n ~ iid N( 0, 1). The difference version of (7.32) is of the type 
in (7.22) with state vector X„ = (X(nAt), X ((n — l)Af), Z n )' , matrix 


A„(X„) 


2 X(nAt) - X((n - 1 )At) - ( At) 2 X(nAt)(a + (b - a)0(Z n )) 

X (ji At) 

pZ n 


matrix B(X n ) = (0, 0, V /1 — p 2 )', and driving noise W n ~ N( 0, 1). 

Figure 7.4 shows samples of Y(t) in [0, 100] for p — 0.95, ( a = 0.7, b = 1.0) 
(left panel), and ( a = 0.65, b = 1.0) (right panel) generated with a time step 
A t = 0.1. The corresponding samples of X(t) with X (0) = 1 are in Fig. 7.5. The 
pendulum oscillation seem to be stable for (a = 0.70, b = 1.0) but are unstable for 
(a = 0.65, b = 1.0). O 
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t t 

Fig. 7.4 Samples of Y(t) for p = 0.95, ( a = 0.7, b = 1.0) ( left panel ) and ( a = 0.65, b = 1.0) 
(right panel ) 




t t 

Fig. 7.5 Samples of X(t) for p = 0.95, (a = 0.7, b = 1.0) ( left panel ) and (a = 0.65, b = 1.0) 
(right panel) 


Figure 7.4 shows that the solutions of equations with random coefficients exhibit a 
broad range of features depending on the properties of their random coefficients. The 
characterization of the complex behavior of the solutions of equations with random 
coefficients is a major challenge that poses notable difficulties in the development of 
efficient and accurate approximate methods for solving this class of equations. 


7.3.3 Conditional Analysis 

The case of linear difference equations with random coefficients varying in time at 
random or according to Markov chain has been examined in a previous section. We 
now examine the case in which the coefficients of these equations follow a semi- 
Markov process. On samples of these coefficients, the state X n satisfies difference 
equations with deterministic coefficients, so that methods of random vibrations can 
be applied to find conditional statistics for X„ . 

Suppose matrices A„ and B n in (7.22) change according to semi-Markov process 
with random transition times 0 = To < T\ < <T r < ■■■ defined by T, = 
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7) _i + S r . r = 1,2,..., where { S r ] are {1,2,.. .{-valued iid random variables. 
The recurrence formula (7.22) on samples of A„ and B n defines difference equa- 
tions with deterministic coefficients driven by random noise in each time interval 
[7>_i(a>), Tried)-). 

We consider the special case of (7.22) in which A„(X n ) is linear in X n and B n (X n ) 
does not depend on X„ , that is, linear models with additive noise. The driving noise 
W n is white with mean gt w {n) and covariance matrix y w (n). On samples (A {r} , B (r) ) 
of {A„, B n ] in the time interval [T r -\ (&.), T r (co )), the state equation is 

x rl m+s+ 1 = A ^ x tIb<o) + * + r = 1,2,..., (7.33) 

where s = 0, 1, . . ., T r (o>) — T r ~ i (a>) is a local temporal coordinate and A (r, \ B lr> 
are constant deterministic matrices. 

Theorem 7.8 In the time intervals [7',_ i (&.), T r (w)) on a sample (A ir \ B {r) ) of 
( A n , B n ), the mean vector pS r \s) = E[X^ an d the covariance matrix 

yd\s) = E[(X%_ i{(a)+s Br (*^ )) (X j> j (,',)) i ,y / r , f .s ) ) ] satisfy the equations 

+ 1) = A (r V (r, (*) + B (r) fx w (s) 

y (r \s + 1) = A (r) y (r \s)(A ir) y + B (r) y w (s){B (r) )’ (7.34) 

for s = 0, 1, . . ., T r (co) — T r —i (co), where the second moment properties ofW n are 
given in the local coordinate. 

Proof The expectation of (7.33) gives the first formula in (7.34). The second for- 
mula results by subtracting the mean equation from (7.33), multiplying the resulting 
equation with its transpose, and taking the expectation. 

The conditional mean and covariance of X n in [7o(&>) = 0, 7) (cy)] can be obtained 
from (7.34) and second moment properties of Xq. The mean and covariance of 
the state at 7) (co) provide initial conditions for (7.34) to continue calculations in 
[7) (&>), 73 («)] and so on. ▲ 

Theorem 7.9 Under the conditions in Theorem 7.8, the covariance function c(p, q) = 
E[(X p — E[X p ]){Xq — E[Xq])’\ of X n can be calculated from 

c(s,s + p) = y r (s)((A ir) ) p y 

c(s, t) = y r (s)((A <r V r(M) - s y((A (r+l) yy 

for p > 0, s, s + p local coordinates in [T r - i(<w), T r (at)], and t local coordinate in 
[Tried), 7> + l(ft>)]. 

Proof We have X s+p = (A <r) ) p X s + ‘ ‘noise” in local coordinates, where X s = 
X s — E [ X s ] and “noise” includes all terms involving driving noise centered at its 
mean. Hence, 7?[Z s X' +p ] = ^^((A^^Xs)'], which gives the first formula in 
(7.35). For example, X s+l = A (r) Z s + B (r) W s so that E[X s X' s+l ] = y r is)iA (r) )' , 
where W s = W s — E[W S ]. 


7.3 Stochastic Difference Equations with Random Coefficients 


263 


For the second formula, note that Xf=(A^ + P) r Xo + “noise” in [T r (co), T r+ \ (®)] 
andZ^^) = ^X^-p “noise” in [7 1 ,— i (&>), T r (&>)] . Since Xo for the time 

interval [T r (co), T r+ \ (<w)] is XT r ( m ) at the end of [T r -\ (w), T r (co)\, we can calculate 
X, from X t = (A (r+1) ) , (A ( '' ) ) 7 ' ( “ I S X. S + “noise”. The expectation Zs[X s X{] gives 
the stated formula. Similar arguments can be used to find the covariance function at 
arbitrary times. A 

Unconditional second moment properties of X„ can be obtained by Monte Carlo 
simulation and the conditional second moment properties given by (7.34)-(7.35). A 
similar approach can be followed for other properties of X n . 

Example 7.17 Let X„ be a real-valued process defined by X „ + 1 — Y„X n + W n , n = 
0, 1, . . ., that is a special case of (7.22) with d = 1 , A„(X„) = Y n X ni B n = 1, 
and W„ independent N( 0, cr 2 ) variables. The times S r = T r — T r —\, r = 1,2,..., 
between consecutive jumps of the semi-Markov process Y„ are assumed to be iid 
{1,2, ...{-valued random variables following a geometrical distribution, that is, 
P(S r — k ) = (1 — p) k ~ l p, where k = 1,2,... and 0 < p < 1. The mean 
and variance of S* are E[Sk\ = 1/p and Var[.S7] = (1 — p)/p 2 . We examine in 
detail time continuous versions of this model in a subsequent section. O 

Consider the special case of (7.22) defining a linear model with additive noise, 
that is, X„+i = AX„ + BW„, where A n — A and B n X n = B are time invariant. The 
second moment properties of the conditional state X n | (A, B) can be obtained from 
the theory of discrete linear systems in Sect. 7.2.1. If the driving noise is Gaussian, 
X n | (A, B) is a Gaussian sequence. The probability law of the unconditional series 
X n can be obtained by eliminating the condition on ( A,B ). The series {X,,} is not 
Gaussian. 

Example 7.18 Suppose the real-valued series X„ is the state of a linear model with 
additive noise, initial state Xo ~ N( 0, 1), coefficients A ~ U(a\, 02 ), — 1 < fli < 
«2 < 1, and B ~ U (b \ , 1)2 ), 0 < b\ < l ?2 < 00 , independent of each other, and 
driving noise W„ ~ N (0. 1). The mean Jl n , variance y n . and covariance function 
c(n, n + q) of the conditional series X„ | (A, B ) satisfy the equations 

Bn + 1 = A p n ■ 

Yn+i = A 2 Yn + B 2 , (7.36) 

c(n, n + q) — A q y n , q > 0, 

implying p n = 0, lim„^ (X) y n = y = B 2 /(l - A 2 ), lim m ,„_„oo c(m,n) = 
y A\m—n\ , and P(X„ < x \ A, B) = (x /Vk») ■ Properties of X„ result by elimi- 
nating the condition on (A, B). For example, P(X n < x) = E | P ( X n < x \ A, B)\ 
and linwoo P(X n < x) = E[&(x / \fy)\. Figure. 7.6 shows the variation of 
<P~ l (P(X n < x)\ with x for n — > 00 . Since the solid line is not straight, the 
asymptotic marginal distribution of X„ cannot be Gaussian. O 

Proof The expectation of X„+i = A X„ + BW n conditional on (A, B) gives the 
conditional mean equation. Since /7 q = E[X q] = 0, we have /7„ = 0. The recurrence 
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Fig. 7.6 Dependence of 
P(X n < x) = E[P(X n < 
x \ A, £)] on x for n — »■ oo 



x 


formula for y n follows by calculating the expectation of the square of X n+ \ = 
AX n + BW n conditional on ( A,B ). Repeated application of this formula gives y„ = 
A 2 ' 1 + B 2 Xit=o A 2(n ~ k ~ {) so that lim„^oo y n = B 2 /( 1 — A 2 ) since |A| < 1 a.s. 
by assumption and A 2 ^ n ~ k ~ l) sums to (1 — A ln )/( 1 — A 2 ). The conditional 

covariance equation follows by direct calculations from c(n, n + 1) = E[X n X n + 1 | 
A, B] = E[X„(AX n + BW n ) | A, B] = Ay„, c(n,n + 2) = E[X n X n+1 \ A, B] = 
AE[X„X n+ 1 | A, B] = A 2 y„ . and so on. The asymptotic marginal distribution of 
the unconditional series X„ results from 


lim P(X n < x) = lim E 

n—>oo n—>oo 



= E 


( x \ 

1_X 

/Wl-A2\1 

\Vn) 

Jr 

V B )\ 


where the last two equalities hold by bounded convergence and the continuity of the 
Gaussian distribution. 

Regarding the plot in Fig. 7.6, if the asymptotic distribution P(X n < x ) were 
Gaussian with mean /i and standard deviation a, then <5 -1 (P (X n < x)) should be 
equal to (x — /x)/ a, so that it should plot as a straight line against x. A 

Example 7.19 Consider the linear model in (7.25). The characteristic function 
(p n {ii) = E[exp(iuX n )] of X n can be obtained from <p„+i(«) = E[<p n (uA n )](pw(u), 
where <pw(u) = E[sxp(iuW n )]. Figure 7.7 shows with solid and dotted lines the 
real and the imaginary parts of <p n ( u ) for n = I (left panel) and n = 10 (right panel) 
for Xo = 1, A n ~ U(ai,a 2 ) with a i = —0.2 and «2 = 0.9, and y w = 1. The 
characteristic function <p„ (u ) is real-valued and time invariant for n > 10 indicating 
that for relatively large times the density of X n is an even function. 

Recurrence formulas for the characteristic function are impractical for vector time 
series since their implementation involves numerical evaluation of multidimensional 
integrals at each step n. Also, the determination of the stationary marginal charac- 
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Fig. 7.7 Real and imaginary parts of ip n (u ) for n = 1 ( left panel) and n = 10 ( right panel ) 

teristic tp(u ) of X n , provided it exists, is rather difficult since <p(n) is the solution of 
tp(u) — E[(p(uA n )](p w (u). O 

Proof The expectation of exp(inA„ + i) is 

<Pn+i(u) = E[exp(iuA n X n )]E[exp(iuW„)] = E[<p„(uA n )]<p w (u) 

by properties of W n and of the conditional expectation. For the special case in which 
A n — p e (—1, 1) is deterministic and W n ~ N(0, y w ), the stationary characteristic 
function of X n can be obtained simply, and is <p(u ) = exp(— y vv n 2 /(2(l — p 2 ))). A 


7.3.4 Stochastic Reduced Order Models 

Stochastic reduced order models are discussed in Sects. A.3 and A.4. These models 
are simple random elements whose samples are selected from samples of target 
random elements and, generally, are not equally likely. For example, a SROM A n 
for A n in (7.25) is a simple random variable with samples (aj, . . ., a m ) that have 
probabilities (pi, . . ., p,„). We calculate properties of the state X n in (7.25) from 
those of the state of this equation with A n in place of A n under the assumption that 
{A n } is an iid series. 

Theorem 7.10 The marginal distribution F n and the moments Ji n (r) = E [X r n ] of 
order r > 1 of X n defined by (7.25) with A n in place of A n satisfy the recurrence 
formulas 



and 


m 


r 


Pn+l(r) = ^Pj X 


a i 

s!(r — s)! J 


d S jil n (s)E[W'- s ], n= 0,1,..., (7.37) 
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where Fo(x ) = Fq (x) — P(X o < x) and p. o(r ) = po (r) = E[X q]. It is assumed 
that {A,,} is an iid series. 

Proof The distribution of X„+t defined by (7.25) with A„ in place of A„ can be 
calculated from 

F n +lW = E[ \(X n+l < x)] = E{E[l(A n X n + W n < x) \ A,,]} 

= J F n [{x — u)/ A„jdF w (u) J = pj J F n ((x — u)/aj)dF w {u)du, 

where F w denotes the distribution of W n . The recurrence formula for moments is 


f r r l 

E[X r n+1 | A„] = E{E[(A n X n + WnY \ A,,]} = E j £ ———A s n E[X s n ]E[W'- s ] 


= XpjX 

j = 1 s=0 


i!(r — i)! J 


KElXttElW'-*] 


where r > 0 is an integer. ▲ 

Example 7.20 Let X n be define by (7.25) with A„ independent U (a \ , ai). — 1 < 
< «2 < 1. random variables, W n ~ N( 0, 1) a Gaussian white noise, and Xq = 1. 
LetA„ be a SROM for A„ with m > lsamplesay = a\ + (j — l/2)(«2 — a\)/m and 
probabilities p/ — 1/m, j = 1, . . ., m. Figure 7.8 shows with solid and dotted lines 
Monte Carlo estimates and SROM-based approximations of the first six moments of 
X n for n = 0, 1, . . ., 10, that is the moments E[X r n ], r = 1, . . ., 6, during the time 
interval [0,10]. The Monte Carlo estimates are based on 100000 independent samples 
of X n and a SROM A„ with m = 5 samples. The plots are for (ai , ai) = (—0.2, 0.9). 
The largest discrepancy in [0,10] between Monte Carlo estimates and SROM-based 
approximations for E[X r n \ are 0.8475%, 1.5628%, 3.5342%, 3.5174 %, 5.1714%, 
and 6.5156% for r = 1, . . ., 6. The second recurrence formula in (7.37) has been 
used to obtained SROM-based moments. O 

Theorem 7.11 The average discrepancy e(n) = E[\X n — X,,!] between the exact 
and the SROM-based solution satisfies the relationship 

n 

e(n+l) <a^p‘, (7.38) 

i=0 

where a = max„]£[|A„ — A„|]£[|2C J |]} and ft = max„{£[| A„ |]}. 

Proof Wehave|X, !+1 -Z„ + i| = | A„X n - A„X n |_< |A„- A„||X„| + |A„||X„-Z„| 
implying s(n + 1) < E[\A n — A„|]£'[|Z )1 |] + £ , [|A„|]e(H). Sincee(0) = 0, we have 
e(n + 1) < a + fe(n) < a + f(a + fte{n — 1)) < a X/'=o /S' ■ If a and f are finite 
and f < 1, then e(n ) < a( 1 — /S")/(l — P) < a/(l — P) at all times. ▲ 
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Fig. 7.8 Monte Carlo 
estimates and SROM-based 
approximations of 

E[X r n ], r = 1 6 for 

(ai,a 2 ) = (-0.2, 0.9) 



7.3.5 Stochastic Galerkin and Collocation Methods 

The first step of the stochastic Galerkin and collocation methods is the representation 
of the random elements in the definition of a stochastic equation by deterministic 
functions that depend on small numbers of random variables. These methods are 
discussed extensively later in this chapter in the context of stochastic differential 
equations with random coefficients and, also, in the following two chapters. Their 
usefulness to stochastic equations of the type considered here seems to be rather 
limited. 

For example, the state in the recurrence formula (7.25) has the expression 

/ n \ n n — 1 

z "+i = (ri A ') zo+ s^ n A i+ w m «= o,i,..„ (7.39) 

\=0 ' i= 0 7=1 + 1 

so that it is a function of (Ao, A \ , . . ., A„) in addition to the driving noise and initial 
state. If {X n } become stationary as n — > oo and we are interested in properties of 
the stationary version of {X n }, the random vector (Ao, Aj , . . ., A„) will have an 
infinite number of coordinates. Also, (Aq, A i, . . ., A„) can be a very large vector if 
the behavior of { X n } over relatively large time intervals is of interest, so that direct 
solutions by the Galerkin and collocation methods are likely to be impractical. 

7.3.6 Taylor Series 

We have seen that the real-valued series defined by (7.39) is the solution of the finite 
difference equation in (7.25). The formula in (7.39) shows that X ll+ \ is a function 
of (Ao, Ai, . . ., A„), Xo, and (Wo, Wi, . . ., W„). We view as a function of 
(Aq, A i , . . ., A„), expend it in Taylor series about (£[Aq], E[A{\ Z?[A„]), and 
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use a truncated version of this series to calculate moments of X n+ \ approximately. 
It is assumed that {A„} and { VI 7 ,, | are iid series. 

Theorem 7.12 The approximate mean and variance of X n+ \ given by its first order 
Taylor expansion about the mean of (Ao, A\, . A n ) are 


E[X n+ 1] ~ E[X 0 ]p n+l 

n 

Var[X„ +1 ] ~ Var[Xo]p 2( ' !+1) + y w ^ p 2( "“° 

i=0 

+ Ya X (E[Al\p 2n + X P 2( "“' _1) 
*=0 ' ;= 0 


(7.40) 


with the convention y. ■_»(•) = 0 and the notations p = Zs[A„], y a — Var[A„], and 
y w = Var [W„], 

Proof The random variables X n + 1 and dX n +\/dAk with (Ao, A i , . . ., A„) set equal 
to its mean value are 


X 0P n 


n — 1 

Y , w ‘ p n 

i= 0 


W„ and Xop" + X W "°"~ 

i= 0 


respectively, so that the first order Taylor approximation of X ll+ \ about the mean of 
(A 0 , Ai, .... A„)is 


X n+ i - 


(x QP n+l + Y J Wip n -‘ + W n \ + (x 0 p n + X W/P" - '’ -1 ) ( A k - E[A k ]). 

^ i= 0 ' k=0 ' i=0 ' 

(7.41) 

The mean and variance of this approximate representation for X n+ \ are given by 
(7.40). The calculation of these moments use the assumptions that (Ao, A i , . . .) and 
(Wq, W i , . . .) are mutually independent, and are also independent of Aq- A 


Example 7.21 Suppose the initial state is deterministic and equal to Xo = 1 so that 
the approximate mean and variance of the state can be calculated from (7.40) with 
ZsfTfo] = 1 and Var[ X^] = 0. The plots in left and the right panels of Fig. 7.9 
are means and standard deviations of X n as functions of n for A„ ~ U (a \ . af) and 
y w = 1 . The solid and dotted lines are Monte Carlo estimates based on 10000 samples 
and first order Taylor approximations. The top and bottom panels are for (a \ , af) = 
(—0.2, 0.9) and ( < 21 , 02 ) = (0.2, 0.9), respectively. The approximate means are 
satisfactory but the approximate standard deviations are inaccurate for (fli, ai) = 
(—0.2, 0.9). The accuracy of the approximate standard deviations deteriorates as 
the uncertainty in A„ increases from A n ~ U( 0.2, 0.9) to A„ ~ U(— 0.2, 0.9). 
Numerical values for the plots in Fig. 7.9 have been obtained by using the variance 
formula in (7.40). O 
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Fig. 7.9 Monte Carlo estimates ( solid lines) and Taylor approximations {dotted lines) of E[X„ ] and 
Std[X n ].ThetopandbottompanelsareforA„ ~ U{— 0.2, 0.9)andA„ ~ U (0.2, 0.9), respectively. 
The left and right panels show means and standard deviations of X n 


7.3.7 Perturbation Series 

The perturbation method is useful for analyzing dynamic systems with small non- 
linearities. We apply this method to stochastic difference equations with random 
coefficients that have small uncertainty. 

Consider the model in (7.25) with £’[A'o] = 0, E[X q] < oo, random coefficients 
A n = p + eY n depending on the iid random variables { Y n } with mean 0 and variance 
y y such that \A n \ < 1 a.s., and |e| <§C 1. Assume that the power series representation, 

X n = X„o + eX n \ + s~X n 2 + ■ ■ • , (7.42) 

of the solution of (7.25) is convergent, and approximate X n by the first two terms of 
this series, that is, X„ — X n = X n o + sX n \. The time series (Z„ o, 2f ;i j) satisfy 
the difference equations 


X„+ i,o = pX n fi + W n and 

Z n+M =pX nA +Y n X ni0 . (7.43) 


Note that these equations have the same operator but different inputs. 
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Theorem 7.13 The variance and covariance functions of X n o, X n \. and X„ — 
X n o + eX n i in (7.37) can be calculated from 

Yoo(n + 1) = p 2 yoo(n) + y w 
yu(n + 1) = p 2 yu(n) + y v yoo(«) 
yoi(n + 1) = p 2 Xoi(«) 

Cki(m,n) = p\ m ~ n \yki(m An), k,l = 0,1, (7.44) 


where Ykiin) = E[X n k X n ;] and c k iim,n) = E[X m k X n /]. The sequences X„ q, 
X n \, and X n have mean 0. 

Proof The bivariate vector Z„ = (X n Q, X n _i )' satisfies the difference equation 


O -H 

t ^ 

1 1 

= 


O 

1 1 

Xn, o 
X nt i m 

+ 

T 

0 


W n = a„Z n + fW n , 


so that E[Z n+ 1 ] = E[a n Z n ] = E{E[a n Z n \ Y„]} = E{a n E[Z n ]} = E[a n \E[Z n ] 
since is a function of Y„, Z n depends on (T„_ i, T„_ 2 , ...), {K„ } is an iid 
sequence, and E[W n ) — 0 by assumption. The conditional covariance yin) = 
E[Z n Z' n | Y n ] is the solution of y(n + 1) = a„y(n)a' n + PP'y w (Sect. 7.2.1), 
so that the unconditional covariance y{n) — E\Z n Z' n ] of Z„ can be obtained from 


y{n + 1) = E[(a n Z n + fW n )(a n Z n + fWj] = E[a n Z n Z' n u' n \ + pf 
= E{E[a n Z n zy n | Y n , F„_i, T„_ 2 , . . .]} 

= E{a n E[Z n Z' n | Y n -i, Y/i— 2 , . . .]a' n ] + PP' = E{a n y{n)a’ n } + pp’ 


since a n and Z„ are independent of W n , W„ has mean 0 and variance 1, a n is a 
function of only Y„ and Z„ depends on (T„_i, K fi _ 2 , ■ ■ ■), and { Y n ) is an iid series. 
This recurrence formula gives the first three equations in (7.44). Similar arguments 
can be used to obtain the last equality in (7.44). A 

Example 7.22 Suppose Y n and IV„ inTheorem7.13 are iid U (—1, 1) and iid A'(0. 1), 
respectively. Under the assumption that the initial state Xq is deterministic, we have 
X 0 ,o = 0, *o,i = 0, E[X n , o + eXy = 0, y kl (i)) = 0, E[(X nfi + eZ,,.!) 2 ] = 
Koo(n) + e 2 yii(«)- The solid and dotted lines in Fig. 7.10 are Monte Carlo estimates 
and first order perturbation solutions for the standard deviation Std[X„] of X n cor- 
responding to ip = 0.3, e = 0.2) (left panel) and ip — 0.3, e = 0.69) (right panel). 
The Monte Carlo estimates are based on 10000 independent samples of X n . The 
accuracy of perturbation solutions decreases with e but remains satisfactory for the 
cases considered here. O 
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Fig. 7.10 Monte Carlo estimates ( solid lines) and perturbation approximations ( dotted lines ) of 
Std[X„] for (p = 0.3, e = 0.2) (left panel) and (p = 0.3, e = 0.69) ( right panel) 

7.4 Stochastic Differential Equations with Random Coefficients 

We consider equations of the type in (7.1) and (7.2), state conditions for the exis- 
tence of unique solutions for these equations, present methods for solving differential 
equations with random coefficients and input, and illustrate numerically the imple- 
mentation of some these methods. 

7.4.1 General Considerations 

The probability laws of the solutions of stochastic differential equations (SDEs) 
are determined by properties of their drift/diffusion coefficients and driving noise 
processes, and can be affected significantly by the uncertainty in their coefficients. For 
example, the solutions of linear SDEs with Gaussian noise are Gaussian processes, 
but become non-Gaussian if their coefficients are uncertain. 

Example 7.23 Let X(t), t > 0, be the solution of X(t) + AX(t) = Y(t), where 
Y (t)dt — dB(t), B denotes a standard Brownian motion, A > a > 0 a.s. is a random 
variable, ^[^(O)] = /xq, and Var[X(0)] = yq. The conditional mean /x(f; A) = 
E[X{t ) | A] and variance y(f; A) = Var[Z(t) | A] satisfy the equations 

(i(t; A) = — A/x(f; A) 
y(t ; A) = —2 Ay(t; A) + 1 

so that n(t; A)=/x o exp (—At) and y(t; A) = yo exp (— 2Af)+(l— exp(— 2Af))/ 
(2A) implying the a.s. convergence /x(f; A) — ► 0 and y (f; A) — »• 1 /(2 A) as t -* oo. 

The conditional random variable X(t) \ A is Gaussian with mean /iff : A) and 
variance y(f; A). On the other hand, X(t) is not Gaussian. For example, estimates 
of the kurtosis coefficient of X(t) for A ~ U( 0.1, 3) based on 10000 independent 
samples are 0.5442, 1.8578, and 2.3515 at times t — 1,5, and 10. O 

Solutions for stochastic differential equations of the type discussed in Sect. 5.5.1, 
that is, ordinary differential equations with deterministic coefficients and white noise 
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input can be defined in the strong and weak sense. We consider here an alternative type 
of weak solutions for ordinary differential equations with random coefficients that 
relates to weak convergence in Hilbert spaces. Weak solutions of this type are used 
extensively to solve stochastic partial differential equations (Sect. 9.4). Following is 
a brief discussion on weak solutions for ordinary differential equations with random 
coefficients and input. 

The weak formulation for initial/boundary value problems requires to find u e H\ 
such that 33(u, v) = J 2 (y), Vv e H 2 , where 33 : H\ x H 2 — ► R. and J 2 : H 2 R. 
are bilinear and linear functionals and H\ and H 2 denote Hilbert spaces. The Lax- 
Milgram theorem states conditions that 33 and J 2 must satisfy such that 33 (u, v) = 
J 2 (v) has a unique solution for the special case ll\ — Ih. The Babuska-Lax-Milgram 
theorem [40] removes the restriction H\ — H 2 . 

Theorem 7.14 (Babuska-Lax-Milgram theorem) Let 8$ : H\ x H 2 —*■ I be a 
continuous bilinear functional. If 33 is weakly coercive, that is, there exists a constant 
c > 0 such that 


sup \33{u,v)\ > c\\u\\hi and 
lb'll h 2 =i 

sup 1 33(u, v) | >0, Vv e H 2 \ [0], (7.45) 

ueHi 

then for all f e H 2 there exists a unique solution u f e H\ such that 33 (u f,v) = 
J 2 (y) for all v e H 2 , where ^ : H 2 — > R is given by iy) = (/, v) Hi ■ Moreover, 
the unique solution u f is bounded by || m y || < ||/||if 2 / c - 

Example 7.24 Let X(t), 0 < t < r, be a real-valued stochastic process defined by 
the stochastic differential equation X(t) + AX(t) — Y{t), t e I — [0, r], with 
initial state X (0) = 0. It is assumed that the random variable A and the real-valued 
stochastic process Y{t ) are defined on the same probability space (32 . & , P), A is 
independent of Y(t), P (a \ < A < 02 ) = 1, 0 <a\ < aj < 00 , and Y(t) has finite 
variance. 

Consider the Hilbert spaces 

y = {Z : I x 32 — >R, Ze L 2 (I x 32), 33(1) x & — measurable] and 
W = [U : I x 32 K, U e L 2 (I x 32), 33(1) x & - measurable, U( 0) = 0 a.s.} 

(7.46) 

with the inner products 


(Zi, Zf)f = E 


Z\(t)Z 2 (t)dt 



Z\(t, co)Z 2 (t, a>)dtP(da>) 


and 


(Ui, U 2 ) r = (ih, u 2 )r + (Ui, U 2 )y, 


(7.47) 


respectively (Exercise 7.13), so that Y = Lr(I x £2, 33(1) x & , X x P), where 
X(dt) = dt denotes the Lebesgue measure on the real line. The norms induced by 
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these inner products on "V and W are ||Z||^ = (Z, Z)y = Zs[ J f Z(t) 2 dt\ and 
\\U\\$r = (U, U) T = E[$J (u (t) 2 + U(t) 2 )dt] = \\u\\ 2 y + \\u\\ 2 r . 

Our objective is to find X e W such that 3Z)(X . Z) = (Z). VZ e ”V , where 

the functionals S3 : W xf-tl and : "P — > M are defined by 


S3(X, Z) = E 


(X(t) + AX(t))Z(t)dt 



(X(t, co) + A(co)X(t, cdj)Z(t, co)dt P(dco) 


and 


JZ(Z) = E 


Y(t)Z(t)dt 



Y ( t , co)Z(t, co)dtP(dco), 


(7.48) 


that is, the integrals of the left and the right sides of X ( t ) + AX ( t ) = Y (t) multiplied 
by Z e y calculated over lx Q under the measure 7. x P . The functional . is 
linear and continuous, and therefore bounded, and 3% is bilinear, bounded, and weakly 
coercive. According to Theorem 7.14, X(t) + AX (t) = Y ( t ) admits a unique solution 
X y e W such that 33(X y , Z) = J(Z), VZeT.O 

Proof The set Y is a Hilbert space since it coincides with L 2 (I x £2, 33(1) x J?, X. x 
P ). Since U e "fp implies that U has finite variance, 'W is a subset of "V . Moreover, 
W is a linear subspace of "V that is complete with the norm in (7.47), so that it is a 
Hilbert space. 

The functional JZ is linear by its definition and properties of integrals. It is con- 
tinuous since, for Zi, Z 2 e "V ■ we have 


\JZ(Z 1 )-JZ(Z 2 )\ = 


/ Y(t, tt>)(Zi(r, m) — Z 2 (f, co))dt P(da>) 

JlxQ 


0 Y(t, co) 2 dt P(dco)\ ( f (Z[(t, co) — Z 2 (t, a>)) 2 dt P(da>)\ 

/xi 2 / \JlxZ 2 ) 


1/2 


by the Cauchy-Schwarz inequality, that is, \J3(Z[) — J?(Z 2 )\ < \\Y\\-y \\Z\ — Zf\-y. 
Since fef, it has finite variance so that .Z is continuous and, therefore, bounded 
(Theorem B.23). 

The functional S3 is bilinear by its definition. It is bounded since 


| S 3 (X, Z) | < 


X(t, a>)Z(t, a>)dt P (dm)\ + 


II 

0 X(t, w) 2 dtP(dai)\ ( / Z(t,ai) 2 dtP(dco)\ 

IxQ ) \JlxQ ) 

+ ( I A(co) 2 X(t,w) 2 dtP(dw)\ 1 ( f Z(t, m) 2 dt P{da>)\ 
\JlxQ / \JlxQ / 


A(co)X(t, a>)Z(t, ai)dt P (da>)\ 
\ 1/2 


1/2 


= \\x\\y\\z\\r + \\AX\\ r \\Z\\y = (\\X\\y + \\AX\\ r j\\Z\\ r 
< (02 v 1)( \\X\\y + m\y )\\Z\\y = (02 v l)\\X\\y//\\Z\\y , 
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by the Cauchy-Schwarz inequality and properties of random variable A. 

We now show that the conditions of Theorem 7.14 are satisfied. For the first 
condition, take Xef and Z = X + AX. Since Z e f, we have 


\Z\\ir = 


(X(t, m) + A(co)X(t , (0))-dtP(d<0) 


<2(||X||^+||AX||^) = 2(fl 2 vl) 2 ||X|| 

\38(X, Z)| = / Z(t,w) 2 dtP(dco) = ||Z||^,and 
J Ix£2 


ZWir = llXllf 


\AX 


2 E 


X(t)AX(t)dt 


> (a i A 1)-||X 


by using the inequality (a + b) 2 < 2 {a 2 + b 2 ), a. b e R, the definitions of 38 and Z, 
and the equality 2£'[/ / X(t)AX(t)dt ] = E[AX( r) 2 ] > 0 resulting by integration 
by parts with Z(0) = 0. We have \38(X, Z/\\Z\\y)\ — \\Z\\y > (const) \\X\\yp 
for Z e Y of the form Z = X + AX so that sup Zg ^/ \\z\\y,=i Z)| > 

(const) || X ||^/, VZ <= W, that is, the hrst condition in (7.45). 

For the second condition in (7.45), take Z e T and set X(t) = J ( J Z(s)ds, t e 
[0, r] , so that X(t) — Z it) since J ( | Z (,j)(7.j is defined in the m.s. sense, and XeF. 
We have 

3§(X, Z) = E 
= E 

= E 


J ^Z(f) + A Z(s)dsjZ(t)dt 

J Z(s)ds^Z{t)dt 


Z(t) l dt 

Z(t) 2 dt 


H'(W 


> o 


by using integration by parts, Zs[(/ 0 r Z(r)zZf) 2 ] > 0, and E[A fj Z(t) 2 dt\ > 0. 
The second condition in (7.45) follows from the observation that \38(X, Z) | > 0 for 
Z e f \{0} and that supremum of \3§(X, Z)| taken over all X e If is larger than 
\f'Z){X . Z)| for the definition of X used in these calculations. A 

Example 7.25 Suppose that A in Example 7.24 is a real-valued stochastic process 
Ait), so that X{t) is the solution of X(t) + A(t)X(t) = Y{t), t € I = [0, t], with 
X (0) = 0. It is assumed that A(t) and Y(t) are independent processes defined on a 
probability space (Q , ,'Z , P) and that P{inf, e j A(t) > a \ , sup /6/ A(f) < a 2 ) = 1 
for some constants 0 < a\ < a 2 < oo. Let W , "V . and JZ be as in Example 7.24, 
and let 38 : W x/->Ibe given by 


38{X, Z) — E 


a 


X(t) + A(t)X(t) ]Z(t)dt 


(7.49) 


which constitute a direct extension of 38 in (7.48). 
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We have seen in Example 7.24 that J* is a linear functional that is continuous 
and, therefore, bounded. That SB is bilinear follows from its definition. Arguments 
similar to those used in Example 7.24 show that SB is bounded and weakly coercive, 
so that the stochastic equation X(t) + A(t)X(t ) = Y(t), t e I = [0, r], admits a 
unique weak solution (Theorem 7.14). O 

Proof The set W is included in "V since, for X e W and almost all co e Q, we have 


\X(t,co)\ = 


X(s, a>)ds 


A/2 


X(s, w)~ds 


-U? s ) 

\ 1/2 

) =T 1 ' 2 \\Xf,w)\\ L 2 (I) , 


1/2 


by the Cauchy-Schwarz inequality, or ||X (•, cd) I lz, 2 (/) — r \\X(-, &0IIl 2 (7) a -S-, which 
gives ||X||y < t||A||-^ by expectation. Hence, X e W implies ||X||y/ < oo, that 
is, X e y. 

For Xef and Z e f, we have 


I SB(X, Z) | 



X(t, co)Z(t, co) + A(t, co)X(t, a>)Z(t, co)dt P(da>) 


) 



A(t, a>)X(t , a>)Z{t, co)dtP(dco) 


< \\X\\ r \\Z\\ r + \\AX\\y\\Z\\ r = (\\X\\ r + \\AX\\y)\\Z\\f 

< (d 2 v l)\\X\\y/\\Z\\y , 


that is, SB is bounded. 

For the first condition in (7.45), take X e W and set Z = X + AX e "V . Direct 
calculations give 


I7II Z — 


= / (*&• 


co) A. (t , co)X(t 




dtP(dco) 


2(\\X\\ 2 y + \\AX\\ 2 y ) <2(a 2 v l) 2 \\X\\ 


| SB(X, Z) | = E 
\\Z\\y = ||X 


(X(t) + A(t)X(t)fdt 
• + WAXWls + lE 




and 


A(t)X(t)X(t)dt 


The expectation Zs[ ( f A(t)X(t)X(t)dt ] is bounded since ||X||y + ||AA 11*1 > 0 
and || Z || ^ > 0 are finite. Let c > 0 be a constant such that ||Z||^,. > c 2 ||Z||^,, 
then | SB{X, Z/\\Z\\-y)\ = ||Z||y > c||X||^ for Z = X + AX, as in the previous 
example. This implies sup Zs y \\z\\-y=i I^C^. Z)| > (const) || AC || ^ for all X e W . 
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For the second condition in (7.45), set X(t) = A(s)Z(s)ds for an arbitrary 
Z e r. Then 


$g(X, Z) = E 
= E 

= E 


J^X(t) + A(t) J A(s)Z(s)ds\z(t)dt 

j A(t)Z(t)^J A(i)Z(i)c/s^t/rj 

(I, 


A(t)Z(t) dt 
A(t)Z(t) 2 dt 


1 

-E 

2 


A(t)Z(t)dt 


by performing integration by parts on the integral A(f)Z(r)( A(s)Z(s)ds')dt. 
Since A(t) > 0 a.s., we have 8&(X, Z) > 0 for arbitrary Z e Y and 38(X, Z) > 0 
for Ze"f\ {0}. Since Z is arbitrary, the second condition in (7.45) holds. A 

Recall that we have established in Sect. 5.5.1 conditions for the existence and 
uniqueness of strong solutions for stochastic differential equations driven by Gaussian 
and semimartingale noise. The theorems in Sect. 5.5.1 offer alternatives to Theorem 
7.14 since they can also be applied to differential equations with uncertain coeffi- 

i 

cients and colored driving noise. For example, if Y ( t ) = dB(t)/dt in Example 7.24 
is a Gaussian white noise, then X(l) + AX(l) = Y(t ) in this example can be given 
in the form 


\dX x (t) = -X 2 (t)X { (t)dt +dB(t) 

| dX 2 (t) = 0, 

with initial conditions Xi(0) = 0 and ATIO) = A, where (X i = X, Xj = A). The 
diffusion process (X \ , Xj) defined by this equations is unique in the strong sense 
if its drift and diffusion coefficients satisfy the conditions in Theorem 5.8. If Y(t) in 
Example 7.24 is a colored noise that can be described by the output of a linear filter 
driven by white noise, then (X i , AT ) augmented with the state vector of this filter 
satisfies a stochastic differential equation driven by white noise. 

We conclude this section with an outline of the methods for solving approximately 
differential equations with random coefficients and input discussed in the follow- 
ing sections. The Monte Carlo, conditional analysis, state augmentation, stochastic 
reduced order models, stochastic Galerkin, and stochastic collocation methods are 
applied in Sects. 7. 4. 2-7. 4. 8 to solve stochastic differential equations with random 
coefficients of arbitrary uncertainty. The methods in Sect. 7.4.9 are for stochastic 
differential equations with random coefficients of small uncertainty. 


7.4.2 Monte Carlo Simulation 

Monte Carlo simulation is the most general method for solving SDEs with random 
coefficients. Its implementation requires complete information on the probability law 
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of both random input and coefficients. Deterministic solvers can be used to calculate 
samples of the solutions of SDEs from samples of their random coefficients and 
input processes. The resulting samples can be used to estimate moments and other 
properties of the solutions of these equations. 

Note that Monte Carlo simulation delivers samples of strong solutions of SDEs 
since they correspond to input/coefficient samples and the generation of these samples 
requires to select versions for all random elements. For example, the generation of 
samples of the real-valued process X(t ) defined by dX(t ) = —AX{t)dt + dB(t) 
in Example 7.23 requires to specify the version of the Brownian motion B(t) and 
the probability law of A. Also, the calculation of solution samples requires that the 
random elements of SDEs be such that these equations have unique solutions in the 
strong sense (Sect. 5.5.1). 


7.4.3 Conditional Analysis 

Let X(t) be the solution of (7.2) and assume that the probability law of conditional 
process X (f) | 0 is available analytically or constitutes the output of an efficient 
algorithm. 

Example 7.26 Let X(t) be as in Example 7.23, so that 0 = A and X(l) \ A is 
an Ornstein-Uhlenbeck process, that is, a Gaussian process with mean p(t; A) = 
p o exp(— At), variance function y(t\ A) = yo exp(— 2A/)+(l— exp(— 2Ar))/(2A), 
and covariance function c(.v, t; A) = y(s A t; A)exp(— A|s — f|). The density of the 
conditional vector (X(t\), . . ., X(t n )) \ A is 



where x = (jcj , . . ., x„)' , p{A) = {p(tp A), . . ., p(t n ; A))', and y(A) = {c(f,- 
tj\ A), i, j = 1 n). 

Properties of the unconditional solution X(t) can be obtained from the second 
moment properties and the finite dimensional densities of X (t) \ A by direct inte- 
gration or Monte Carlo simulation. For example, the mean of X(t) is the expectation 
pit) = E[po exp(— At)], that can be obtained by numerical integration or can be 
estimated from 



where {ak, k = are n s independent samples of A. O 


The solution in Example 7.26 by conditional analysis is efficient since the prob- 
ability law of X(t) | A is known as a function of 0 = A. Generally, the probability 
law of X(t ) | 0 is not available analytically, so that this approach is rarely useful in 
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applications. For example, the probability law of X(t) \ 0 in Example 7.26 cannot 
be expressed as a function of © = A if the Gaussian driving noise is replaced with 
an arbitrary non-Gaussian process. 


7.4.4 Conditional Monte Carlo Simulation 

Consider a special case of (7.1) with real-valued state X(t) and coefficients that do 
not depend explicitly on time. We denote Y(t) in (7.1) by f (t) since this process has 
some special features. Specifically, X(t) is the solution of 

dX(t) =a(X(t),^(t))dt + b(X(t),^(t))dB(t), t> 0, (7.50) 

where B(t) denotes a standard Brownian motion and q (1) is a semi-Markov process 
with values in a finite set {0 1 , . . ., 9 n } and transition rates {qik(t), k,l = 1 
from state #/ to state Ok at time t. Let 7’o <'/)<■■■< T, <■• ■ denote random 
transition times defined by the recurrence formula 

T r = T r -\ + S r , r= 1,2,..., (7.51) 

where To = 0 and { S, | are iid random variables with finite mean. Since £(f) is 
constant in T r ), r — 1,2,..., X(t) in (7.50) is a diffusion process during 

the time intervals [T r -\, T, ) with drift and diffusion coefficients a(X (7), f (t)) and 
b(X(t), f (f)). It is assumed that the drift and diffusion coefficients are such that 
(7.50) has a solution that is unique for all values of f (f) (Sect. 5. 5. 1.1 in this book, 
[4], Sect. 4.7.1. 1). 

We present two methods for calculating properties of X(t). The first method pro- 
posed in [7-11] develops integral equations for marginal moments and other prop- 
erties of X(t) under some rather restrictive conditions. The second method, referred 
to as conditional Monte Carlo simulation, uses the random vibration theory to find 
properties of the conditional process X(t) | §(f) in each time interval [T r -\, T r ). 
Unconditional properties of X(t) are obtained by averaging properties of X ( 1 ) \ %(t) 
over samples of £ (r). Conditional Monte Carlo simulation is particularly efficient for 
stochastic differential equations with drift and diffusion coefficients that are linear 
in and independent of X(t), respectively. In this case, X (t) \ £(•) is Gaussian in the 
intervals of constant values of f (f)- 

We now review the first method that delivers integral equations for moments and 
other state properties. Let /; : M — ► R be a measurable function and denote by 

G k (t,x) = E[h(X(t)) | X(0) = x,£(0) = 0*], t > 0, (7.52) 

the expectation of h(X(t)) conditional on (X(0) = x,^(0) — 0 k ) calculated on 
a sample of Brownian motion B(t). The latter condition is not indicated explicitly 
for simplicity. Depending on the functional form of h, G k (t,x) provides various 
properties for X(t). For example, G k (t, x) is the conditional expectation of X(t) if 
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h(x ) = x, the real or the imaginary part of the conditional characteristic function 
of X(t ) if h(x) = cos (ux) or h(x) = sin(ttx), u e M, and the conditional marginal 
distribution of X(t) at z if h(x) = I (x < z). 

We follow the arguments in [7-11] to develop an integral equation for E[X(t) \ 
X(0) = x,§(0) = 6k\- If 5 has no transition in [0,t) and £(0) = 9 k , then 
X(t) = X k (t) satisfies the stochastic differential equation dX k (t) — a k (X k (,t))dt + 
bk(Xk(t))dB(t), t > 0, where ajt(-) = a(-, 9k) and b k {) = b(-, 9k). The probability 
of this event is 1 — Q k (r)dr, where 


n 

Qk( r)= X 9kl(r), (7.53) 

l=l,l£k 


is the rate of transition out of 9k at time r. 

Suppose | has a jump at time r e [0, t), and its value changes from 9k to 9i. The 
rate of this change at time r is quit). The expectation of h(X(t)) conditional on 
(X(r), §(t) = 9/) is G/(t — r, X(r)), f > r, on a sample of Brownian motion B(t), 
so that 


G k (t,x) 
= E 


| ~h(X(t)) | X(0) = x, f (0) = 0 k , fhas no jump in [0, f)j (l - ^ QkMdr^j 


>( 


1=1, l^k 

= h(Xk(t))(\ - / Q k (T)dT 


l *)]< 


h(X(t)) | W (0) = x, 5(0) = 0k, 5jumps to Of atr e [0, t) |ryj./(r)dr 
G/(t - r, X k (T))q k i(T)dr (7.54) 




by arguments of the renewal theory ([29], Chap. 3). 

Generally, (7.54) cannot be solved analytically. Numerical solutions of this equa- 
tion are impractical since they require discretization of both spatial and temporal 
coordinates, Xk( r) can be a cf-dimensional process, d > 1, and Gk(t, x) corresponds 
to a single sample of the driving noise. It seems that (7.54) can be solved efficiently 
only if the drift a(X ( t ), 5 ( t )) is linear in X(t) and b(X ( t ), 5 ( t )) = 0, that is, a linear 
dynamic system with no driving noise. In this case, we have Gk(t, x) — xGkit) and 
G/(t — r, Xk( t)) = Xk(r)Gi(t — r) so that (7.54) becomes 


xG k (t) = h(X k (t))(l - [ Qk(r)dr\+[ V X k (r)Gi(t - r)q kl {r)dr. 

\ J o /Jo l=im 

(7.55) 

The expectations G k (t) can be obtained numerically from, for example, the Laplace 
transform of (7.55), as illustrated in [9]. 

Consider now the second method, refer to as conditional Monte Carlo simulation. 
The method calculates properties of the state of differential and difference equations 
with random coefficients and input in two steps. First, random vibration theory is 
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used to calculate properties of the conditional process X(t) | £(■) on samples of 
fit). Second, unconditional properties of X(t) are obtained by averaging conditional 
properties of X(l) | f (■) over samples of fit). 

Let fit, co) be a sample of the semi-Markov process fit) in (7.50) defined by 
a sequence of jump times 0 = Tfoj) < 7) ( co) < ■ ■ ■ < T r (a>) < • • • and states 
frico) = fiT r ico), co) e {0i, . . ., 0„}. Since fi t , co) is constant during the time inter- 
vals [r,_i(cu), T r ico)), r — 1, 2, . . ., we have 

fit,u>) = ^^§, (tt>)l(r,_i(ft)) <t< T r (a>)), t > 0. (7.56) 

r>l 

We have seen that the solution X r (t) of (7.50) with fit, co) in place of fit) in 
[T r -\(co), T r ((L >)) is a diffusion process defined by the stochastic differential equation 

dX r (t) = a r (X r (t)) dt + b r (X r (t))dB(t), t e [7>_ { (co), T r (co)), (7.57) 

with initial state X r _i (7)._i (co)), drift a r ( •) = «(•, •(«)), and diffusion b r (-) = 
b(-, t; r (oj)). If the drift and diffusion coefficients in (7.50) are such that this equation 
has a unique solution almost surely, the processes satisfying (7.57) exist and are 
uniquely defined. The notation X r (t) may be misleading since it does not emphasize 
the dependence on £(t, co), that is, the fact that X r (t) is the conditional process 
X(t) | £(•, co) restricted to time interval [T r -\(co), T r (co)). We use this notation for 
simplicity. 

Efficient methods for calculating statistics of the conditional processes { X r (t), r = 
1,2,...} are available for arbitrary equations. However, properties of these processes 
can be obtained efficiently in cases of practical interest, for example, differential 
equations with linear drift and state-invariant diffusion, that is equations of the type 

dX r (t) = a r X r (t) dt + b r dB(t), t e [T r -i(co), T r (co)), (7.58) 

where a r and b r denote a(f(t)) and b(^(t)) in the time interval [7j_i(a>), T r (a>)). 
Equations of this type are studied in linear random vibration. The probability law 
of the Gaussian processes X r (t) is completely defined by their second moment 
properties, that can be obtained from the mean and covariance equations in (7.6). 

We first illustrate the second method by the solution X{t) of the stochastic differ- 
ential equation 


dX(t) = — £(f) X{t) dt + dB(t), t> 0, (7.59) 

that is, a linear version of (7.50) with drift a(X(t), §(0) = £(f) > 0, 

diffusion b(X(t), f (t)) = 1, and initial state X (0) = Xq, a Gaussian variable with 
mean /iq and variance yo that is independent of Bit). 

Theorem 7.15 Let X(t) be defined by (7.59). The mean and the variance functions, 
ji r (t) and Yr(t), of the conditional processes [X r (t)} on a sample f (t, co) of^{t) are 
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MO = /t f -i(7’,._i( W ))r^“ )( '- T - lM) 

MO = y r MMi(®)M 2M ‘ u)Cf “ 7> - l(<B)) (7.60) 

fort e IM-iOw), 7). (to)], .so that the recurrence formulas 

MM®)) = M,-i(r r _i(o;))e-^ (<u)(r ^ ft)) - r - 1 ^ ) 

MM®)) = 7r _ 1 (r r _iM)r^ M(T ' ( “ ) - ! ''- l( “ )) (7.61) 

hold for r = 1,2,... vwY/z /xo(7b(<y)) = /xo and MM®)) = yo- 

Proof Results in (7.60) are solutions of (7.6). For example, the mean equation is 
f r (s) = — M®)/MM s e [0, T r (a> ) — 7>_i (co)), with initial condition /x r (0) = 
/x r _i(Ml(®)). A 

Theorem 7.16 Let X(t) be defined by (7.59). The correlations of the random vari- 
ables Xq (Tq (®)) and X p (T p (a>)), 0 < p < q, can be obtained recursively from 

£[^(7 9 fo))X p (7- p fo))] = £[X <? _ 1 (7- (? _ l M)X, J (7-p(w))]e-0(“)(OM)-7' I ,- 1 M)). 

(7.62) 


Proof Note that 


r T q (co) 

X 9 (r 9 (cu))=X ? _i(r 9 _i(w))M« ( ®)( 7 « ( ®)- 7 9-i(®)) + / e ~^ (o>){T i (m) ~ s) dB(s), 

J Tq — 1 (Gj) 

so that the expectation of the product X q (T q (co))X p (T p(o>)) has the expression in 

(7.62) since X p (T p (to)) is independent of future values of the Brownian motion. 
The above formula can be applied recursively beginning with q — p + 1 since 
E[Xq(T q (a>))X p (T p (co))] for this value of q depends on E[X p (T p (co))X p (T p (co))], 
which is given by (7.61). A 

Theorem 7.17 Let X(t) be defined by (7.59). The correlation function of the condi- 
tional processes [X r (t)} is given by 


E[X p (s)X q (t)\ 

= E[X p _ 1 (T p _ l (Q)))Xq_ l (Tq_ l (w))]e~^^ )is ~ T P- l( ‘ o)) e~^ (m){, - T i- 1 ^ )) 

iW) 


+ E 


L JTn-\((0) J 


(7.63) 


for 0 < p < q, s e [T p -i(a>), T p (co)], t e |T ? _i(®), T q (a>)]. 

Proof Straightforward calculations yield (7.63). For example, E[X p (js)X q (t)] for 
q — p + 1 is 


282 


7 Stochastic Ordinary Differential and Difference Equations 


E[X p (s)X p+l (t)] 

x (x p {T p {(o))e-^ m - r P m + [ 

V JT p (u>) 


>T p -i(a>) 
„-%P+ l(<u)(f— 


e -$p(a)(s-u) dB ( u ) 
l> dB(v)) 


) 


= E 


Xp-dTp-iiwVXp-iiTpia)) 

X p (T p (co)) f e-^ (oy > (s - u >dB(u) 


-^p(w)(s-T p -i(m)) -T p (m)) 

^P+l(<»)(t-Tp(a>)) 


and 


e\x p (T p (w)) f e-^ m ^- u) dB(u) 1 

L J Tp—i (to) J 

= e\ [ ” e~^‘ o)(T P io,) ~ v) dB(v) [ 

_ f" e -^pM(Tp(<o)-u) e -( p M(s-u) du _ e -t; P (m)<T p (a>)+ S ) f* e ~2i; p (.co)u du 
JTp-^co) JTp^m) 

— e~ £i>(< w )( 7 p( tw )+ J ) ^g2^((W)j _ e 2£p(o>)Tp-i(o)) 


Similar calculations can be performed to find correlations £[Xp(i)Z 9 (t)] for arbi- 
trary times y and t. A 

We conclude the analysis of the stochastic differential equation in (7.59) with 
the following comments. First, Theorems 7.15-7.17 hold for arbitrary non-Gaussian 
noise equal in the second moment sense with the Brownian motion B(t), for example, 
Poisson white noise defined as the formal derivative of a compound Poisson process 
C(t ) = XfiV where N(t) is a homogeneous Poisson process with intensity 
p > 0 and { y* } denote iid real- valued random variables with mean E[Y\] = 0 
and finite variance such that pE[Y^] = 1. Second, these theorems can be extended 
to -valued state vectors X(t). Third, similar arguments can be used to calculate 
statistics for solutions X(t) of nonlinear stochastic equations. The calculations are 
feasible if properties of X(t) conditional on £(f) can be obtained with a reasonable 
effort. For example, let Wt) be a geometric Brownian motion defined by the stochastic 
differential equation 


dX(s) — cX(s) dt + er X(s)dB(s), s e [0, T r (a>) — T r -\ (co)), (7.64) 


where c and a correspond to a sample %(t , co) of £(f) during a time interval 
[7>_i(&0, T r ((o)). The solution of (7.64) is (Example 5.6) 

X(s) = X(0)e (c ~ a2/2)s+aB(s \ J6[0J r (ffl)-r r _i(ffl)), (7.65) 


7.4 Stochastic Differential Equations with Random Coefficients 


283 




t 
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Fig. 7.11 Conditional means and variances of X(t) ( left panel) and mean and variance of X(t) ( right 
panel) for ( a = 0.01, b = 3.0) 


where the initial state X ( 0 ) = X (T r - \ (a>)) is a random variable independent of 
{B(s),s > 0}. Statistics of X(t) conditional on f(f) can be obtained simply. For 
example, the marginal distribution of X (.v) in [0, T r (co) — T r - 1 (o>)) conditional on 
X(T r -i(a>)) = x 0 is 

F(x | xo) = <2>f(ln(x/xo) - (c - a 2 /2)s)/{a Vs A , s e [0, T r {w) - T r -i(co)). 

(7.66) 

The unconditional marginal distribution results by averaging over X(Q). Other exam- 
ples of nonlinear stochastic differential equations that admit simple solutions can be 
found in [41] (Sect. 4.4). 

The remainder of this section presents two examples showing that the second 
method works even if £(f) does not take a finite number of values and can be used 
to assess the performance of degrading systems. 

Example 7.27 Let X(t) be defined by (7.59) with initial state X(0) ~ N(n o, yo) 
assumed to be independent of B(t) and £(f). The semi-Markov process £(f) takes 
independent U{a,b ) values in distinct time intervals [T r -\, T r ), where T, are the 
jump times of a homogeneous Poisson process with intensity k > 0. Numerical 
results are for no = 3, ero = 1, and A. = 1. The plots in Figs. 7.11-7.12 and Fig. 
7.13 are for ( a = 0.01, b — 3.0) and (a = 0.9, b = 1.1), respectively, and are based 
on n s — 50 independent samples of ? (t). 

Conditional means and variances of X(t) are shown with solid and dotted lines in 
the left panel of Fig. 7.11. The solid and dotted lines in the right panel of the figure 
are averages of the conditional means and variances, that is, estimates of the mean 
and variance of X(t). Figure 7.12 shows with solid lines estimates of the marginal 
density of X(t) at times t = 0.5 (left panel) and / = 5 (right panel). The dotted lines in 
the figure are Gaussian densities matching the means and variances of X(t) at these 
times. They show that X(t) is not a Gaussian process and that the discrepancy between 
the marginal distribution of X(t) and the Gaussian distribution changes in time. 
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Fig. 7.13 Conditional means and variances of X(t) deft panel) and mean and variance of X(t) ( right 
panel) for (a = 0.9, b = 1.1) 


Figure 7.13 shows plots similar to those in Fig. 7.11 for £ (t) taking values in the 
interval [0.9, 1.1] rather than [0.01,3.0]. In contrast to Fig. 7.11, differences between 
conditional means and variances calculated from n s = 50 independent samples of 
f (t) are small. Marginal distributions of X(t) are indistinguishable from Gaussian 
distributions with the means and variances of X(t), and are not shown. This is an 
expected result since X(t ) becomes an Ornstein-Uhlenbeck process as \b — a\ -* 0. 

If X — > 0, the semi-Markov process § (f ) is the random variable (t) = f (0) 1 (t > 
0) so that the marginal density of X(t) is 


fx(t){x) = 


r b 1 
Ja a X(t) («) 


/ x - ft X (t)(<x) \ da 

V Jb-a 


(7.67) 


where /rx(r)(°0 and <Jx(r)( a ) denote the mean and standard deviation of X(t) for 
§(0) = a. Generally, fx<j)(x) is not a Gaussian density. O 

Example 7.28 Consider a linear oscillator with mass M, damping C, and stiffness 
K subjected to Gaussian white noise with mean 0 and one-sided spectral density of 
intensity go > 0. The oscillator displacement satisfies the differential equation 
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MX(t) + CX{t) + KX(t) = W{t), t > 0, (7.68) 

where W(t ) is interpreted as the formal derivative of standard Brownian motion B(t) 
scaled by jrgQ , that is, W (t ) = ^/ngodB(t)/dt. 

It is assumed that (1) K can be in one of the states k\ > ■•• > k r ■ •• > k m +\ > 0, 
where m > 1 is an integer and X lr) (t). defined by 

MX {r \t) + CX (r \t) + k r X {r) {t) = W(t), r = \,...,m, (7.69) 


denotes the oscillator displacement in damage state k r , (2) transitions of K are only 
possible from k r to k r+ \ . and they occur when the system state leaves a safe set 
(— x cr , x cr ), x cr > 0, and (3) x cr is sufficiently high such that the random times E r 
of residence in damage state k r are much longer than the duration of transient system 
response in this state, so that the crossings of A (,) (r) out of (— x cr ,x cr ) are rare 
events. Accordingly, these crossings define approximately a homogeneous Poisson 
process with intensity 


Mr(xcr) = ^exp( - — 


<-i) 


(7.70) 


for M = 1, where cr, 2 = jrgo/(4t; r vj) denotes the stationary variance of X^ r \t), 
v 2 = k r , and 2 'C ir v r = C ([42], Sect. 1 2.2), and the distribution of F r can be calculated 
from 


P(r r > s) — exp( — /u.,- (x cr )^) , 5 > 0. (7.71) 


Suppose the oscillator fails when its stiffness drops to k m+ j during a reference time 
interval [0, r] , so that the probability of failure in r is 

Pf(r) = p{T j r i<r\ = p { r m<r). (7.72) 

V= t ' 

Under our assumptions, the random times {/]•} are independent random variables so 
that the characteristic function of the time to failure T m is 


VtJu) = E[e iuJ = ll E[e iur ’] - J] 


iPr(x cr ) 


L = -^ U + i/ly (x cr ) 


(7.73) 


since r r is an exponential random variable with expectation l//x r (x cr ) ([43], Sect. 
26). Since T m is positive, its distribution can be calculated from ([44], Theorem 3.2. 1) 


1 r u i _ e -‘ ut 

Pr m {t) = P{T m <t) = lim — / : (p T ,„(u)du 

ii — >oo 2,71 J —H IU 

= l im _L 1 ~ e ~‘ Ut fr Mrfer) 

«— >oo 2jt J_n U k 1 u _(_ //x,-(x C r) 


du, (7.74) 
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or ^Tm (0 = In fT m ( s)ds , where 



•oo 


e 


,—iur 


<PT m ( u)du 


(7.75) 


is the density of T m . The probability of failure in [0, r] is Pf(r) = Fj m { r). 

The numerical results in the following figures are for M = 1, C = n/lO, vi = n, 
v 2 = 0.97T, V3 = 0.8;r, V4 = 0.77T, V5 = 0.67T, k r = v 2 , and = C/( 2v r ). The 
noise intensity and the critical threshold are go = 1 and x cr = 1.5. The oscillator 
fails when its stiffness drops to £5 = v 2 = (0.6;r) 2 . Other failure conditions can be 
considered. The average residence times in the first four damage states are E [ / i ] = 
4.3949, E[r 2 \ = 3.6859, £[r 3 ] = 3.2240, and £[T 4 ] = 2.9508. Figure 7.14 
shows with solid and dotted lines the real and the imaginary parts of the characteristic 
functions of r r , r = 1 , . . . , 4. The left and right panels in Fig. 7.15 show the density 
of failure time T m and the failure probability Pf( r). <> 


7.4.5 State Augmentation 

The theory of stochastic differential equations can be applied to find properties of the 
solutions of both linear and nonlinear differential equations with random coefficients 
and input under some mild conditions, as has already been mentioned at the end of 
Sect. 7.4.1. For example, properties of X(t) in Example 7.26 can be obtained from 
those of the bivariate diffusion process with coordinates (X \(t). X 2 (t)) defined by 


dX\ (t) = -X 2 (t)X i(t) dt + dB(t) 
dX 2 (t ) = 0, 


(7.76) 


and initial conditions (Xi(0) ~ N(/j, 0, Ko), X 2 (0) = A), where Xi(0), A, and B(t) 
are mutually independent. The drift and diffusion coefficients in (7.76) satisfy the 
uniform Lipschitz conditions in (5.29), but are not bounded so that Theorem 5.7 
cannot be applied directly. Arguments as in Example 5.11 can be used to apply this 
theorem and conclude that (7.76) has a strong solution that is unique in the strong 
sense. 

The assumptions that the driving noise is a Gaussian white noise and A is 
time invariant are not essential. For example, suppose A is a time-variant random 
coefficient and clB(t) is replaced by Y(t)dt defined by dA(t) = oq (A{t))dt + 
a i(A(t)) dB[(t) and dY{t) = (Y(t))dt + fi 2 (Y (t)) dB 2 (t) , respectively, where 
B\ and lh are mutually independent standard Brownian motions that are indepen- 
dent of B. The Revalued stochastic process (X 1 = X, X 2 — A, X 3 = Y) defined 
by 


dX\ (t) = -X 2 (t)Xi(t) dt + X 3 (t)dt 
dX 2 (t) = oi[(X 2 (t))dt + a 2 (X 2 (f)) dB[(t), 
dX 3 (t) = ^ (X 3 (t))dt + p 2 (X 3 (t)) dB 2 (t ) 


(7.77) 
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Fig. 7.14 Real and 
imaginary parts of the 
characteristic function of 
r r , r = 1, .... 4. in solid 
and dotted lines , respectively 





t T 

Fig.7.15 Density of failure time T m (left panel) and failure probability Pf(x) for x CI = \.5(right 
panel) 


is a diffusion process. If the drift and the diffusion coefficients of (7.77) satisfy the 
conditions of Theorem 5.7 or 5.8, this equation has a unique solution. 

While conceptually attractive, the state augmentation method has a limited use in 
applications since augmented equations have higher dimension than that of original 
equations and, usually, are difficult to solve. For example, the augmented equation 
for a linear system with random coefficients is a nonlinear stochastic differential 
equation, as illustrated by (7.76). 

Example 7.29 The moments fi(p, q\ t) — E[X\ (t) p X 2 (t) q ] of the bivariate diffu- 
sion process in (7.76) satisfy the ordinary differential equation 

(lip, q\ t ) = - pn(p , q + 1; t) + P(P 1- V (P - 2, q\ t ) 

with appropriate initial conditions. These equations cannot be solved exactly since 
they are not closed. 
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The density f(x i, x 2 ; t) of (X \{t). X 2 U)) satisfies the Fokker-Planck equation 


df(xi, x 2 \ t) 
3 t 


3 

3xi 


, 1 3/Ui,x 2 ; t) 

X\X2f{xi,X2\ t) + 

2 3xi 


so that, under the assumption A > 0 a.s., f(x 1 , x 2 ; t) converges as t — > 00 to a 
time-invariant density f s (x 1 , X 2 ) defined by 


3 

3xi 


X|X 2 / S (X1 , x 2 ) + 


1 dfs(x 1 , x 2 ) 

2 3xi 


= 0 . 


The solutions of both the transient and stationary Fokker-Planck equations are diffi- 
cult when dealing with realistic systems since the state vectors of these systems have 
usually large dimensions. For the simple case considered here, the stationary density 
of (Xi(f), Z 2 (f)) has the expression 

/sOl,X 2 ) = \fxijjt exp(— X^X 2 )/ a (x 2 ) 


where f a denotes the density of A. Additional examples and considerations on the 
state augmentation method can be found elsewhere ([4], Sect. 9.2.4). O 

Proof For large times, we have X(t) \ A ~ N( 0, 1/(2A)), so that its density is 
fs(x 1 | x 2 ) = V-W 7T exp(— Xj x 2 ) and /^(xi,x 2 ) = f s (x 1 | x 2 )/ fl (x 2 ). Straight- 
forward calculations show that f s (x i,x 2 ) so defined satisfies the above stationary 
Fokker-Planck equation. A 

Example 7.30 The displacement X(t) of a simple oscillator with unit mass, damping 
C > 0 and stiffness K > 0 subjected an harmonic action satisfies the differential 
equation X (f) + CX(t) + KX (r) = a sin(vf). It is assumed that C > 0 and K > 0 
are independent random variables and a, v > 0 are some constants. The augmented 
Revalued process Z(f) with coordinates {Z\ = X, Z 2 = Z, Z 3 = C, Z 4 = K) is 
defined by the differential equation 


dZ(t) = 


Z 2 (t ) 

-Z 4 (t)Zi(r) - Z 3 (f)Z 2 (f) + a sin(vt) 
0 
0 


clt = h(Z{t), t) 


so that its density f(z\ t ) satisfies 

% 7 = ~ _ Z4Z1 “ Z3Z2 + “ sin(vf ))/]’ 

known as the Liouville equation ([4], Sect. 9.2.5). O 

Proof Let (p (u ; t ) = E[exp(i u' Z(t))] denote the characteristic function of Z(r) so 
that f(z\ t) — J K 4 exp (—iu'z)(p(u; t)du/( 2jr) 4 . The time derivative of <p(u\ t ) is 
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4 4 

Y = iY^u k E[Z k (J)e iu ' z ^] = iY^u k E[h k {Z{t), t)e iu ' z(n ]. 
k = 1 k = 1 

The Fourier transforms of the left and the right sides of this equation are df/dt 
and — ]T 4 =1 /r4 [d(fhk)/dz k ]e\p(iu'z)dz, respectively. Integration by parts of the 
latter term gives the right side of the Liouville equation. ▲ 


7.4.6 Stochastic Reduced Order Models 

Properties of stochastic reduced order models (SROMs) are discussed in Sect. A. 3. 
These models are simple random elements, that is, elements that have a finite number 
m of samples that may not be equally likely. Optimization algorithms can be used to 
select the samples of SROMs and their probabilities. SROMs are used to approxi- 
mate target random elements in the definition of stochastic differential equations and 
construct simple representations for the solutions of these equations. 

Example 7.31 Let X(t) be the solution of 

X(t) + AX(t) = Y(t), tel = [ 0, r], (7.78) 

with A ~ U (a\, aj), 0 < a\ < aj < oo and Y (t) = cos(vf). This equation is also 
solved by the stochastic Galerkin and stochastic collocation methods in Examples 
7.33 and 7.35. Consider a SROM A for A with samples a® = a\ + (k — 1 )Aa, k = 
1, ...,m, m > 3, that are equally spaced at Aa = ( a 2 — a\)/{m — 1). The 
probabilities of these samples are selected to be p\ = p m = 1/(2 (m — 1)) and 
p k = 1 /(in — 1), k = 2, . . ., m — 1. Note that A has been constructed by heuristic 
considerations, and may not be optimal. 

The solution of X(t) + AX(t ) = cos(vt), t > 0. with X (0) = 0 and A set equal 
to a®, k = 1, . . m, is 

X (k \t) = T ( a®cos(vf) + v sin(vf) - a®exp(— a®f) ) . (7.79) 

{a (k ’Y + V- \ / 

The functions (X®(f)} and their probabilities {p k } of the samples of A define a 
SROM X{t) for X(t), that is used to approximate properties of X(t). For example, 
moments of order r of X (t) are given by 

m 

E[X(t) r ] = Y J Pk{X (k \t)) r , t > 0. (7.80) 

k=\ 

The plots in the following two hgures are for \ai , aj] = [1,5] and v = 5. Figure 
7.16 shows Monte Carlo estimates of E[X (r) 2 ] (left panel) and E[X (r) 4 ] (right panel) 
based on 1000 independent samples of A(r). Approximations of these moments based 
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on a SROM of A with m = 5 are indistinguishable from Monte Carlo estimates at the 
figure scale. The thin solid lines in Fig. 7.17 are Monte Carlo estimates of E[X (f) 4 ] 
based on sets of five independent samples of X(t). The heavy solid line is the Monte 
Carlo estimate of E[X(ty~\ in Fig. 7.16 (right panel). In contrast to SROM-based 
estimates, Monte Carlo estimates of E[X (f) 4 ] based on m = 5 samples are unstable, 
and can be inaccurate. O 

Example7.32 LetX(t) be the solution of (7. 78) with/1 = 1 andT(f) = cos (0t), 0 ~ 
U (vi, V 2 ), 0 < vi < V 2 < 00 . This equation is also solved in Examples 7.34 and 
7.36 by the stochastic Galerkin and stochastic collocation methods. Let 0 with 
samples v® = vi + (k — l)Av, k = 1, . . ., m, Av = (V 2 — vi)/(/n — 1), m > 3, 
and probabilities p\ = p m = l/(2(m — 1)) and pk = 1 /(m — 1), k = 2 , . . ., m — 1, 
be a SROM for 0. The solutions X^(t) of X(t ) + X(t ) = cos (0t), t > 0, for 
0 = v® and X (0) = 0 are 

X (k) (t) = (1 + ( 1 vW)) 2 ( c °s(v W 0 + v® sin(v®0 - exp(— r)Y (7.81) 

so that X(t) with samples { X {k) (t)\ and probabilities { p ^ ) is a SROM for X(t). 
Properties of X(t) can be calculated simply, as illustrated in (7.80). 

Numerical results in the following figures are for [vi, V 2 ] = [1,5] and several 
valued of m. The solid lines in Fig. 7.18 are Monte Carlo estimates of E[X (t) 2 ] 
and E[X (f ) 4 ] based on 1000 independent samples of X(t). The dotted lines are 
approximations of these moments based on SROMs with m = 10 (left panels) 
and m = 30 (right panels). There is a notable improvement in the quality of the 
approximations E[X (t)' ] for E[X(l) r ] as the model size is increased from m = 10 
to m = 30 . The approximations E[X(t) r ] corresponding to a SROM with m = 30 
trace the target moments over the entire time range. The heavy solid line in Fig. 7.19 
is the Monte Carlo estimate in the right panels of Fig. 7.18. The thin solid lines in 
the figure are Monte Carlo estimates of E[X(ty] based on sets of 30 independent 
samples of X(t). These estimates are unstable and have notable errors, in contrast to 
SROM-based approximations using m = 30 samples. O 


7.4.7 Stochastic Galerkin Method 

We consider stochastic differential equations of the type in (7.1) and (7.2). If X(t) is 
the solution of (7.1), the random functions in the definition of the drift and diffusion 
of this equation need to be approximated by parametric models, that is, deterministic 
functions of time depending on a finite number of random variables. This approxi- 
mation, referred to as the discretization of the probability space, is examined to some 
extent in Sect. 9.4.6. 1. 

Galerkin solutions for stochastic differential equations with random coefficients 
are weak solutions for these equations corresponding to specified spaces of trial 
functions. We have seen in Examples 7.24 and 7.25 how the Babuska-Lax-Milgram 
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Fig. 7.16 Monte Carlo estimates of E[X (f) 2 ] ( left pane I) and E[X (f) 4 ] (right panel) and SROM- 
based approximations for m = 5 


Fig. 7.17 Monte Carlo 
estimates of E[X(t ) A ] based 
on 1000 independent 
samples of X(t) (heavy solid 
line) and on sets of five 
independent samples of X(t) 
(thin solid lines) 



t 


theorem can be applied to prove the existence and the uniqueness of weak solutions 
for stochastic differential equations with random coefficients. The following exam- 
ples show how the stochastic Galerkin method can be implemented to calculate weak 
solutions for this class of stochastic equations. 

Example 7.33 Let solve the stochastic differential equation in Example 7.31 by 
the stochastic Galerkin method. We have seen in Example 7.24 that this equation 
admits unique weak solutions. Since A ~ IJ (ci \ , aj). 0 < a\ < ai < 00 , has 
bounded support, we expand both the random coefficients A and the solution X(t) 
in the subspace spanned by Legendre polynomials {(/>,■ (U), i = 0, 1, • ■ •, m} up to a 
degree m > 1, that is, 
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Fig. 7.18 Monte Carlo estimates of E[X(t) r ] based on 1000 independent samples of X(t) {solid 
lines ) and approximations of E[X(f) r ] {dash lines) based on SROMs with m = 10 {left panels) 
andm = 30 (right panels) 


Fig. 7.19 Monte Carlo 
estimates of £[X(f) 4 ] based 
on 1000 independent 
samples of X(t) (heavy solid 
line) and corresponding 
estimates based on sets of 30 
independent samples of X(t) 
(thin solid lines) 



a\ + 02 , a 2 ~a[ 

A= — — + 4 MU) 

m 

!=0 


(7.82) 
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t t 

Fig. 7.20 Means (left panel) and standard deviations ( right panel) of X(t) and X(t) for m = 4 


where U ~ U(— 1, 1). With these representations, the defining equation for X(t) 
becomes 

XftW 4>i(U) +( C -\^ + ”^t(f/) = cos(vf), ( 7 - 83 ) 

i =0 ' 7 1=0 

and has projections 

m m 

+ a ' ai 

i=0 1=0 

+ ai 4 ai E[cj)i(U)<l)i(U)<t>j(U)] = cos(vt)E[(pj(U)], j = 0, 1, . . m, 

(1M) 

on (pj(U ). The coefficients {fii (t)} in the representation of X(t ) satisfy a system of 
coupled linear differential equations. The solution of this system of equations and 
the expression of X ( t ) can be used to find properties of X(t) approximately. 

Figure 7.20 shows estimates of the means (left panel) and standard deviations 
(right panel) of X(t) and X (t) with solid and dotted lines, respectively. The estimates 
are based on 1000 independent samples of these processes, where X(t) is given by 
(7.82) with m — 4. The first moment of the approximate solution X(t) traces closely 
the corresponding moment of X(t). O 

Proof It is argued correctly that Legendre rather than Hermite polynomials should 
be used to represent random coefficients with bounded support, such as A in this 
illustration. However, Legendre polynomials may not be adequate for solutions of 
stochastic equations whose probability laws may differ significantly from those of 
their coefficients. 

We consider trail functions </>(?; U ) = where </>( 0; U) = 0 

and fi ( t ) are real-valued functions of time. The corresponding members of "V and 
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W in (7.46) are such that F[ fj 4>(t\ U) 2 dt\ < oo and Zs[ J Q T <p(t; U) 2 dt\ < oo, 
respectively. The representation in (7.82) is consistent with the assumption that X(t) 
is a member of W . 

The Legendre polynomials are defined by <pi(x) = d‘ ({x 2 — 1 )') /dx l , i = 
0, 1 , . . ., and can be obtained from the recurrence formula 

<pi + \(x) = 2(2 i + 1 )x(pi(x) — 4i 2 <pj-\(x), x e [—1, 1], (7.85) 

for i > 2 with (jnfx) — 1 and <p\(x) = 2x (Sect. B.6). These polynomials have 
norms ||<&|| 2 = £[</.,■ (U) 2 ] = (il) 2 2 2i /(2i + 1). A 

Example 7.34 Let solve the stochastic differential equation in Example 7.32 by the 
stochastic Galerkin method. The construction of the Galerkin solution is simple in this 
case since the uncertainty is in the input and the differential equation X(t) + AX(t) = 
Y(t), Y (t) = cos (0t), is linear. Since © takes values in a bounded interval, we 
view the input Y(t) as a member of the space spanned by Legendre polynomials 
{< pi(U ), i = 0, 1, . . ., m], U ~ U(—l, 1), up to a degree m > 1, that is, 


m 


Y(t) ~ K(t) = ^« ; (G0 i (G), 


(7.86) 


where ot,-(r) = E[cos(0t)ipj(U)]/ E[<pj(U) 2 ]. The solution X(t) for Y(t ) in (7.86) 
gives X(t) ~ X(t) = YJILo Pi(t)<j}i(U), where (t) = / 0 ' exp(-(f - s))ai(s)ds. 
Properties of X(t) can be approximated by those of X(t), for example, E[X (tf] ~ 
E[X(t) r ] = E[(X/=o Pi(t) < l>i(.U)y] and can be estimated simply and efficiently 
from samples of U. Numerical results are for A — 1. 

The solid and dotted lines in Fig. 7.21 are estimates of the first two moments of 
X(t) and X (t) for m = 4 (left panel) and m = 9 (right panel) obtained from 1000 
independent samples of these processes. The plots that take only positive values 
and positive/negative values are standard deviations and expectations, respectively. 
There is a notable improvement in the quality of the approximate moments of X(t) 
as m is increased. This is consistent with the fact that Y ( t ) converges in m.s. to Y(t) 
as m — > oo. Yet, the accuracy of higher order moments of X (t) improves much 
slower with m. For example, the maxima of (£[Y(f) 4 ] — F[Y(f) 4 ])/F[Y(f) 4 ] in 
percentages are 90.81%, 25%, and 22% for m = 4, 9, and 19, respectively. O 

Proof The coefficients a, (r) in the representation of Y (?) are obtained by projecting 
this process on the coordinates of the selected basis, that is, by imposing the condition 
E[cos(0t)ct)j(U )] = FlX/^Lo a i (t)(j>i (U)<pj (U)]. The representation of X(t) and 
the defining equation for X(t) give 


m 


m 


m 


Y, Pi ( t)<Pi (U) + Y Pi ( 00 / (U) = Y ( 00 ; (U). 


i= 0 


i = 0 


i = 0 


The projection of this equation on fj(U), j = 0, 1, . . m, gives the differential 
equation f ft) + Pi(t ) = aft), i = 0, 1, . . ., m, with initial condition /3;(0) = 0 
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Fig. 7.21 Means and standard deviations of X(t) and X(t) form = 4 (left panel) and m = 9 (right 
panel) 


following from X (0) = 0. Note that, in contrast to the problem in Example 7.33, the 
equations defining the coefficients {/3, (t)} of the approximate solutions are decou- 
pled. Moreover, the equations for {/?,- (f )} delivered by the weak formulation coincide 
with those obtained by direct calculations using the linearity of the defining equation 
for X(t) and the representation of Y(t). A 


7.4.8 Stochastic Collocation Method 

Let X(t), t e [0, r], be an -valued stochastic process satisfying a stochastic dif- 
ferential equation of the type in (7.2) with coefficients and/or input depending on a 
random vector 0. If the coefficients and/or input are random functions, they need to 
be approximated by parametric models, that is, finite sums of specified deterministic 
functions of time with random coefficients. We have seen that similar representations 
are needed to implement the stochastic Galerkin method. 

For simplicity, X(t), t e [0, r], is assumed to be a real-valued stochastic process 
defined by a stochastic differential equation with random coefficients and input. Sup- 
pose the random functions in the definition of this equation are parametric and depend 
on a random vector 0 defined on a probability space (f2, ■'¥ , P). The collocation 
solutions for this equations have the form 

m 

X(t,a) = ^X (i) (t)ii(a), t e [0, r], a e T, (7.87) 

i = 1 

where r = 0(f2) C is the range of 0, {a (I ^ e r,i = 1, denote 

collocation points, and X il) (t) = X (t, ct (l} ) are solutions of the deterministic 
equations obtained from the stochastic differential equation with 0 set equal to 
<r {l) e r, i = 1 , . . . , m . As for the stochastic Galerkin method, it is assumed that r 
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is a bounded set in and that the equation defining X(t) has solutions for almost all 
( t , y) e [0, r] x r. A broad range of theoretical considerations on the stochastic col- 
location method and practical aspects related to the implementation of this method 
can be found, for example, in [45-47]. 

Theorem 7.18 If X(t, a) is a continuous function of a e r for a fixed t e [0, r], 
there exists an algebraic polynomial pit,-) : T — >■ R. such that \\X{t, ■) — 
Pit, •) lloo < s for arbitrary s > 0. 

Proof The polynomial p in the theorem is the collocation approximation X(t, a) at 
time t. The proof follows from properties of polynomial approximations (Theorems 
8.10 and 8.15). 

Note that the random variables X(t) are bounded with probability 1 for each t since 
X(t , •) : r — > M is assumed to be continuous and I' is bounded. The degree of the 
polynomial p satisfying \\X{t, •) — pit, Olloo < £ depends on t since the properties 
of X(t, •) change in time. If X (t, a) is approximated by polynomials of a with the 
same functional forms and time dependent coefficients, the accuracy of the resulting 
collocation solutions will not be uniform in 1. A 

Theorem 7.19 Under the assumptions in Theorem 7.18, the discrepancy between 
the correlations E[X (s)X(/)] and E[X (s)X(t )] can be made as small as desired for 
arbitrary but fixed times s and t. 

Proof For s, t e [0, r] fixed we have 

|£[X(s)X(/)] - E[X{s)Xim\ = | E[X(s)X(t) - X(s)X(f)]| 

< £[|X(y)X(t) - X(s)X(f)|] < £[|X(j)||X(0 - X(t) \ + |X(s) - X(j)||X(0|] 

< (E[X{s) 2 ]E[{Xit) - Xf)) 2 ]) 1 ' 2 + (E[(X(s) - X(s)) 2 ]E[Xit) 2 ]) l/2 

so that |Zs[X(i)Z(r)] — £[X(5)7£ _ (r)]| can be made smaller thane > 0 since Zs[X(i) 2 ] 
and E[X (f ) 2 ] are bounded and it is possible to construct a polynomial approximation 
X whose error at sand /be smaller thane /(2c), where c > 0 is a finite constant such 
that £[X(s) 2 ] V £[Z(/) 2 ] < c. 

Similar arguments can be used to show that |£[X(s)X(/)] — £[X(j)X(/)]| < e 
holds at a finite number of times s and /. However, additional conditions are needed 
for £[X(s)X(/)] to approximate £[X(s)X(/)] within e at all times s,te [0, r], A 

Example 7.35 Let X(t) be the solution of (7.78), that is, this process satisfies the 
equation X(t) + AX(t) = Y(t), with A ~ U{a\,af), 0 < a\ < aj < oo and 
Y it) = cos(vf), v > 0. We have solved this problem by the SROM and stochastic 
Galerkin methods in Examples 7.31 and 7.33. The collocation solution in (7.87) 
becomes 


m 


m 


m 


a — a 

a U) — 0 ’ 


X(t, a) = Y J x ^it)i i ia) = 2> ( °(0 ] 


(7.88) 
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Fig. 7.22 Mapping 
(?, a) m* X(t, a) for 
r = 1, a\ = 0.1, <32 = 5, 
and v = 5 



where e A(Q) = [a i , c/ 2 ], i = 1, . . ., m, denote collocation points and 
is given by (7.79). Figure 7.22 shows the variation of the solution X(t,a) in [0, r ] x 
[ai, 02 ] fort = 1, a\ = 0.1, 02 = 5, and v = 5. Collocation solutions using m > 4 
equally spaced points in [«i , c/ 2 ] are very accurate, in agreement with Theorem 7.18 
since X is continuous in a e [c/] , 112 ] and varies slowly with this argument at all 
times. O 

Example 7.36 Let X(t) be the solution of (7.78), that is, the equation X (?) + AX (?) = 
T(?), with A = 1, Y (t) = cos (0?), & ~ Uiv\,V 2 ), 0 < vi < V 2 < 00 . 
We have solved this problem by the SROM and stochastic Galerkin methods in 
Examples 7.32 and 7.34. The collocation solution X(t , v) is given by (7.88) with 
(v, v (,) ), v (i) G [vi, v 2 ], in place of (a, a (i) ) and X {i \t) in (7.81). 

The following numerical results are for [vi, v 2 ] = [1,5] and equally spaced 
collocation points in this interval. Figure 7.23 shows the mapping (?, v) m- X (?, v) 
for m = 4 (left panel) and m = 20 (right panel). The plots suggest that a relatively 
large number of collocation points is needed to capture the dependence of X(t, v) 
on parameter v. The solid lines in Fig. 7. 24 are estimates of E[X(t) 2 \ obtained from 
1000 independent samples of X(t). The other lines are moments E[X (t) 2 ] obtained 
from a collocation solution with several values on m for r = 5 (left panel) and 
r = 10 (right panel). The accuracy of E[X(t) 2 ] relative to Monte Carlo estimates 
of E[X (?) 2 ] depends on the number m of collocation points but does not necessarily 
improve with m, even for sets of refining collocation points. A sequence of collocation 
points is refining if the points for a collocation solution with m = m 1 are included 
in those for a collocation solution with m = m 2 > wt - O 

The results in Fig. 7. 24 are consistent with Theorem 7.18 stating that the accuracy 
of collocation solutions is not uniform in time. For example, the approximate moment 
E[X(t) 2 ~\ for m = 8 is superior to that for m = 4 in the time interval [0,4] but not in 
[0,5]. That collocation solutions may not improve with m poses notable difficulties 
in applications since m determines the computation effort. 
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Fig.7.23 Mapping (t, v) (-»■ X(t, v) for z = 10, [vi, V 2 ] = [1, 5], m = 4 (left panel) and m = 20 
( right panel) 




t t 

Fig. 7.24 Monte Carlo estimates of E[X(t ) 2 ] and approximations E[X{t ) 2 ] corresponding to var- 
ious values of m for r = 5 ( left panel) and x — 10 (right panel) 


Theorem 7.20 Under the conditions of Theorem 7.18, it is possible to construct a 
collocation solutions X(t ) whose marginal and finite dimensional distributions are 
as closed as desired to the corresponding distributions ofX(t). 

Proof The discrepancy between the characteristic functions of X(f) and X (t) is 


| E[e iuX(n ] - E[e iuX(t) }\ = \E[e iuX(,) (\ - e iu ^~ x ^)]\ 
1/2 


< \E \ l-e‘ 


iu(X(t)-X(t))\2 


-('[ 


0 


(l — cos (u(X(t) — X(t)))Y + sin 2 («(X(r) — X{t))) 


1/2 


< V2\u\(E[(X(t) - X(t)) 2 ]) 


21X1/2 


by the Cauchy-Schwarz inequality, properties of the characteristic function, and 
the inequalities sin 2 (x) < x 2 and (1 — cos(x)) 2 < x 2 . For an arbitrary s' > 0 and 
0 < u < 00 , there is a collocation approximation X such that || X (t , ■) — X (t, -)lloo < 
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s = e'/(V2u) so that | E[e' uX ^] — < s'. Since u is arbitrary, we 

conclude that it is possible to construct a collocation solution X{t) such that the 
marginal characteristic functions of X(t) and X(t) are as close as desired. Similar 
arguments hold for the characteristic functions of the vectors (X {t \ ),..., X (/,,)) and 
(X(fi), . . ., X(t n )), where n > 1 is an integer and (ti, . . ., t„) are points in [0, r], ▲ 

Example 7.37 Let cp(w, t ) = E[exp(iuX(t))] and^(w; t) = E[e\p(iuX(t))] denote 
the characteristic functions of X{t) and X (t) in Example 7.36. The top left, top right, 
and bottom left panels in Fig. 7. 25 show characteristic functions (p(u\ t) for m = 4, 
m = 8, and m = 20 at t = 4.5. The bottom right panel shows the characteristic 
function (p(u\ t) at t = 4.5 The solid and dotted lines are the real and imaginary 
parts of the approximate and exact characteristic functions. Numerical values are 
for 0 uniformly distributed in [vi, V 2 ] = [1, 5], as in Example 7.36. Monte Carlo 
estimates are based on 1000 independent samples. 

The plots are consistent with the statement of Theorem 7.20 in the sense that it is 
possible to construct a collocation solution such that <p(u: t ) be as close as desired 
to (p(u\ t ) at an arbitrary but fixed t, for example, (p(u\ t ) for m — 20 and t = 4.5. 
The plots are also consistent with a comment in Example 7.36 that the accuracy 
of collocation solutions may not increase monotonically with m. For example, the 
discrepancy max_ 5< M <5 \cp(u; t) — q>(u\ f)| between ip(u; t) and (p(u ; t ) increases 
from 0.1886 for m = 4 to 0.2350 for m = 8. <> 

We conclude this section with comments on similarities and differences between 
solutions of stochastic differential equations with random coefficients by the SROM, 
stochastic Galerkin, and stochastic collocation methods. 

Solutions of stochastic differential equations by SROMs and stochastic colloca- 
tion are constructed from deterministic solutions of these equations corresponding 
to samples of their random coefficients and driving noise processes. The stochastic 
collocation method requires that the random elements in the definition of stochas- 
tic equations be described by parametric models depending on a random vector 0 
such that 0(Q) is a bounded set. The selection of the number and the location of 
collocation points in 0(Q) relates to the computation effort and the accuracy of 
the polynomial representations in the probability space. The probability law of 0 is 
not used to select collocation points. It is common to assume that 0 has indepen- 
dent coordinates. SROM-based solutions do not require to approximate the random 
entries of stochastic equations by parametric models. However, the current version 
of ESROMs assumes that all random entries can be modeled by parametric models. 
The samples of the reduced order models used to approximate random coefficients 
and inputs as well as their probabilities are selected via optimization algorithms with 
objective functions quantifying discrepancies between features of the probability 
laws of target random elements and their SROMs. 

There are no similarities between SROM- and Galerkin-based solutions. The sto- 
chastic Galerkin and collocation methods are similar in the sense that both methods 
use parametric models to describe approximately the random functions in the defin- 
ition of stochastic equations. In contrast to the collocation method that construct the 
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Fig. 7.25 Characteristic functions tp(u ; t ) for m = 4 (top left panel), m = 8 (top right panel), 
andm = 20 (bottom left panel) and characteristic function <p(u\ t) (bottom right panel) at t = 4.5 


solution of a stochastic equation by interpolating between solutions of this equation 
corresponding to collocation points, the stochastic Galerkin method selects a finite 
stochastic basis and assumes that the solution belongs to the subspace spanned by 
this basis. The coefficients of the solution are calculated from a system of equations 
obtained by projecting the proposed form of the solution on the members of the 
selected basis. 


7.4.9 Taylor, Perturbation, and Neumann Series 

We consider stochastic differential equations with random coefficients and input that 
have small uncertainty. Examples illustrate the solution of this class of equations by 
the Taylor, perturbation, and Neumann series methods. 

Example7.38 Let X(t) be the solution of (d/dt— a +ZX(t) 2 ) X(t) — Qsm(vt), t > 
0, where (Z, Q) are uncorrelated random variables with finite variance, and a, v > 0 
are constants. Let X{t\ p, z , ptq), V z (t; p, z , pt q ), and V q (t\ pt z , ptq) denote the solu- 
tion X(t; Z, Q ) of this equation and its partial derivatives with respect to Z and Q 
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Fig. 7.26 Functions 

X 0 I /X*, Pq), V z (t\ p z , Pq), 
and V q {t\ fi z , fi q ) fora = 

I , fJ. z = —0.5, p q = 1, and 
v = 5 



evaluated at the mean (/x,, /x £/ ) of (Z, 0. The first order Taylor approximation of the 
solution X(t; Z, Q ) viewed as a function of (Z, 0 is 

•^(f> Z, Q) — X(t\ /Xj, /x^) + V z (t', /Xj, /x 9 )(Z /x z ) + Vq(t\ /x z , fx q )(Q /x^), 

so that the approximate mean and variance functions are Zs[X(f)] ~ A - (t\ n z , /i q ) 
and VarfA 1 (f)] ~ V z ( t; /x z , p. q ) 2 o 2 + V 9 (t; /x z , /x 9 ) 2 a 2 , where cr 2 andcx 2 denote the 
variances of Z and 0 Note that these moments depend only on the first two moments 
of Z and Q. 

Figure7.26 shows the deterministic functions X(t\ p z , p, q ), V z (t; p z , p q ), and 
V q {t\ fx z , n q ) fora = 1, /x z = —0.5, /x 9 = 1, v = 5, and A r (0) = 0. Weuse these 
functions to calculate the mean and variance of A '(f) approximately. <> 

Proof The approximate representation for X(t; Z, Q) consists of the first terms of 
the Taylor series expansion of X(t; Z, Q) view as a function of (Z, 0. The determin- 
istic functions X(t\ /x z , pt q ), V z (t; [x z , pt q ), and V q (t\ ji z , p q ) are the solutions of 
the differential equation for X and of the derivatives of this equations with respect to 
Z and Q for (Z, Q) set equal to (/x-, /i q ). For example, V q (t\ /x-, /iq) is the solution 
of ( d/dt - a)V q (t; fx z , fi q ) - 3 /x z X(f, p. z , fx q ) 2 V q (t; /x-, fi q ) = sin(vf). ▲ 

Example 7.39 Let X be the solution of (d/dt + ft + eY(t))X(t ) = Q, t > 0, 
where Y(t) is a stochastic process with mean zero, Q denotes a random variable, and 
e is a small parameter. The first order perturbation solution is X(t) ~ A" (l) (r) = 
Xo(t) + eXdt), where X 0 (t) = (Q/f) (l - e ~P‘) , Xft) = -(Q/ P)Z(t)e~l }t , 
and Z(t) = f' (e^ s — l) F (s)ds. Moments and other probabilistic properties of X(t) 
can be calculated approximately from this or higher order perturbation solutions. 

Figure7.27 shows histograms of X(t) and A'^(f) for A'(O) = 0, >3 = 1, t = 5, 
Q ~ 17(0.5, 1.5), and Y (t) = Y ~ 17(0.5, 1.5) obtained from 1000 independent of 
these solutions under the assumption that Q and Y are independent random variables. 
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Fig. 7.27 Histograms of the exact and first order perturbation solutions for X(0) = 0, /} = 1, 
t = 5 , Q ~ U (0.5, 1.5), and Y(t) = Y ~ 17(0.5, 1.5) 


The histograms of X(f) and Z (1 ^(f) are similar for s = 0.1 but differ significantly for 
e = 0.5. O 

Proof The random functions Xq and X\ satisfy the differential equations Xo(t) + 
fX 0 ( t) = Q with X 0 (0) = 0 and X x (t) + fXft) = -Y(t)X 0 (t) with Xi(0) = 0, 
so that they have the stated solutions. We have E[X(t)~\ = {E[Q\/ 1 6)(l — e~^ r ) 
and £[Z(s)X(0] = £[(X 0 (j) + (*0« + s*i(0)] = E[X 0 (s)X 0 (t)] + 

e£ , [Ao(i)Xi(f) + Zi(s)Zo(t)] + e 2 E[Xi(s)X\{t)]. Note that £[.X'(s , ).X'(f)] would 
include two additional terms of order e 2 if based on higher order perturbations, the 
expectations £'[Zo(i)X 2 (t)] and E[Xi{s)XQ{t)]. ▲ 

Example 7.40 Let X(t ) be the solution of (d/dt + f + Y)X(t) = Q, t e [0, r] 
where Y and Q are independent random variables and |T|/r < 1 a.s. The Neumann 
series for X(t) is absolutely and uniformly convergent a.s. in [0, r], and 

X{t ) = f{t)-x[ H(t, s; A) f(s)ds, (7.89) 

Jo 

where A. = — 1, f(t) = (Q/P)( H(t, s; A) = - 1 * r_1 K r (t, s), and 

the kernels K r (t, s ) can be obtained from K r (t, s ) = K(t, a)K r -\(o, s)do for 
r >2 with K\ (t, s) = K(t, s) = 1(0 < s < f)Texp(— fi(t — s)) ([4], Sect. 8. 4. 1.4). 
Figure 7.28 shows with solid and dotted lines the exact and approximate mean and 
standard deviation of X(t), t e [0, r] for Y ~ U(— 0.5, 0.5), Q ~ t/ (0.5, 1.5), 
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Fig. 7.28 Exact and approximate mean and variance of X for fi = 1, X(0) = 0, Y ~ 

U (—0.5, 0.5), Q ~ U (0.5, 1.5), and r = 2 


r = 2, ft = 1, and A" (0) = 0. The dotted and solid lines coincide at the scale 
of the figure. The approximate solution consists of the first two terms of the above 
Neumann series. The approximate mean and standard deviation are accurate in this 
case. O 

Proof The integral form of the version X(t ) = —/3X (t) + [— YX{t) + Q] of the 
defining equation for X(t) is 



(7.90) 


so that X(t) satisfies a Fredholm equation. If the kernel K is square integrable on 
[0, r] x [0, r] and |k| < ||.K'|p 1 , HX'H 2 = /[q z f- K(t, s)dtds, then X(t) admits 
the infinite series representation in (7.89). Useful information on Neumann series 
can be found in many text books ([4], Sect. 8.4. 1.4, [48], pp. 266-269, [49], 
Chap. 3, and [50], pp. 49-53). A 


7.5 Applications 

Methods for solving stochastic equations with random coefficients and input devel- 
oped in the first part of this chapter are applied to study problems involving stochastic 
stability, noise induced transitions, uncertain dynamic systems, and reliability for 
degrading systems. Numerical examples are presented to illustrate the implementa- 
tion of various methods and assess their accuracy. 
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7.5.1 Stochastic Stability 


Instability is the cause of spectacular failures in structural and mechanical systems, 
and can occur under time invariant/variant, deterministic/random actions. A system 
is said to be stable under a specified action if its response remains bounded at all 
times. If a real-valued process X{t) denotes a system response, we say that the system 
is stable if for arbitrary s > 0 there exists S £ > 0 such that \X(t)\ < e, t > 0, 
if | X (0)| < 8 e . The system is said to be asymptotically stable if |X(f)| —*■ 0 in 
some sense as t — > oo. Our discussion is limited to the asymptotic stability of the 
trivial solution X(t) = 0 for dynamic systems driven by white noise. A broad range 
of results on the stability of dynamic systems under deterministic and stochastic 
actions can be found in [6, 51-53]. 

Let X(t) be a real-valued stochastic process defined by (7.1) with drift and diffusion 
coefficients depending on only X(f) rather than X(t) and Yit). Moreover, the drift and 
diffusion coefficients are linear in the state, that is, X(f) is the solution of 

dX{t) = -pX(t-)dt + oiX{t-)dB{t) + o 2 X{t-)dC{t), t> 0, (7.91) 

where f, oi, 02 > 0 are constants, X(t-) = lim s j, X (s) denotes the left limit of 
the state at time t, Bit) denotes a standard Brownian motion, and 

Nit) 

C(t) = ^ Y k , t > 0, (7.92) 

k= 1 

is a compound Poisson process with iid jumps [Yf\ and jump times 0 = 7’o < 7 | < 

• • - < Tk < ■ ■ ■ corresponding to a homogeneous Poisson process Nit) with intensity 
X > 0. It is assumed that B{t) and C(t) are mutually independent and that initial state 
XU)) is independent of these processes. 

In random vibration, (7.9 1 ) is said to have multiplicative noise since its diffusion 
coefficients are state dependent. Note also that (7.91) can be viewed as a stochas- 
tic differential equation with random coefficients of the type in (7.1) in which the 
coordinates of Y(t ) are the formal derivatives of Bit) and C(t). 

Theorem 7.21 If 1 + 02 Ti > 0 a.s., the solution o/(7.91) is 


X{t) = X(0)exp 


- (p + of/2)t + cr l B(t) + C*(t) 


t > 0, 


(7.93) 


where C*(t ) = Y.k= 1 Y k and Y k ~ ln (' + a l Y k)- 
Proof The Ito formula for semimartingales (5.7) applied to 

V (t ) = X(0)exp[pt + cr\B(t) + C*(r)], t > 0, 


viewed as a function g(t, m, v) = V it) of 1, u = Bit), and v = C*(t) gives 
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V(t) - V (0) = 


l 


' dg(s,B(s),C*(s-)) 


0 + ds 

d g(s,B(s),C*(s-)) 


ds + 


f 8 g(s, B(s), C‘(s-» 


r 

70 + 

Z [*<*• 

L 


3v 


dC*(s) + 


if 

2 Jot 


o+ 9“ 

2 , 


dB(s) 


d A g(s,B(s),C*(s-)) 


d[B, B](s) 


B(s),C*(s))-g(s,B(s),C*(s-))- 


du 2 

3v 


4C*(j) 


where 4C*(.v) = C*(s) — C*(s—) denotes the jump of C* at S. This equation 
simplifies to 


V(t) — V(0) = p / V(s—)ds + (T i / V(s—)dB(s ) 


+ 



J o+ 

y(i-)^+ ^ [V(j— )( exp(z4C7*(j)) — 1)], 

0<s<t 


by using the definition of V(t), since d[B, B](s) = ds, 

f [ag(i,B( S ),C*(*-))/3v]dC*(s)= X [9g(*,B(j),C*(*-))/3v]4C*(i), 

J0+ 0<s<t 

g(s, B(s), C*(s )) - g(s, fl(i), C*(s— )) = V( S -)[exp(4C*(s)) - l], 

and cxp(AC*(s)) — 1 is equal to exp( Yf ) — 1 = 05 Tit if 5 = 7* and 0 if .v differs 
from the jump times of Nit), that is, exp(Z\C*(s)) — 1 = ct 2 AC(s). The differential 
form of above integral equation is 

dV(t) — (p + af/2)V(t-)dt + o\V (t—) dB(t) + 02 V {t— ) dC(t), 

so that it coincides with defining equation of X(t) for p = —ft — oq 2 /2. ▲ 

Example 7.41 The solution of (7.91) with <r 2 = 0 in (7.93) can be obtained in 
an alternative way by using Stratonovich rather than Ito calculus. The Stratonovich 
version of this equation is 

dX(t) = i-fi - af/2)X(t) dt + cri X(t) dB(t), t > 0, (7.94) 

by using the Wong-Zakai correction (Theorem 5. 10), which gives (7.93) with <72 = 0 
by following the rules of the classical calculus. O 

The diffusion process X(t) defined by (7.91) with a 2 = 0 is referred to as geo- 
metric Brownian motion. By analogy, we call X(t) in this equation with cr\ = 0 
geometric compound Poisson process. In the remainder of this section, we examine 
the asymptotic stability of the trivial solution of (7.91). 

Theorem 7.22 Under the assumptions in Theorem 7.21, the trivial solution of (1.91) 
is asymptotically stable a.s., that is, lim^oo X(t) — 0 a.s., if 
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-{ft + of/2) + A£[ln(l + a 2 Yi)\ < 0. (7.95) 

Proof Consider the alternative form 

X(t) = X(0)exp[f( — /I — er 2 /2 + a\B(t)/t + C*(f)/t)] = X(0)e tZ(,) (119) 

of (7.93). We need to establish conditions under which Z{f) is strictly negative a.s. at 
large times, and conclude that, under these conditions, the trivial solution of (7.91) 
is asymptotically stable a.s. 

The facts B(t)/t — »■ 0 and C*(t)/t — > A£[ln(l + e^Li)] a.s. as t — » oo ([54], 
p. 496 and Theorem 3.3.2, p. 189) imply lim / _ ! , 00 Z(f) = —f> — <t 2 /2 + A.£[ln(l + 
o 2 Y\)~\ a.s., so that Z(f) < 0 at large times if —ft — er 2 /2 + A.£[ln(l + g 2 Y\)] < 0, 
that is, the condition in (7.95). Note that this condition involves both system and 
noise properties, through the parameter ft, the intensity a\ of B(t), the intensity a 2 
of C(t), and the size and the frequency of the jumps of C(t). ▲ 

With the notation in the previous theorem, \X(t)/X(0)\ = exp (tZ(t)) so 
that ln(|X(f)/X(0)|)/r = Z(t). The limit X LE = lim,^ ln(|Z(f)/Z(0)|)/t = 
lim r ^oo Z(t), referred to as Lyapunov exponent, is equal to the left side of the 
inequality in (7.95). If A.le < 0, the trivial solution of (7.91) is asymptotically sta- 
ble a.s. Lyapunov exponents are used extensively in stochastic stability studies ([6], 
Chap. 8). 

Example 7.42 For cs\ — a 2 = 1 and Y\ ~ U(—a,a ) with a = 3/X such that 

1 + Yi >0 a.s., the stability condition in (7.93) becomes — (/3 + 1/2) + XZs[ ln(l + 
Y \ )] < 0 or, equivalently, f > ftO.) = — 1/2 + 7.L[ln( 1 + Y \ )]. The solid line in 
Fig. 7.29 partition the (f, X) plane in stability and instability regions for the trivial 
solution of (7.91). 

The stability region for the trivial solution of (7.91) is larger than the stability 
regions for the trivial solutions of this equation under only Gaussian (cd = 0) or 
Poisson (cti = 0) white noise, that is, the regions ft > —1/2 and f > 7,L(ln( 1 + Ij )]. 
Figure 7.30 shows samples of X(t) corresponding to (f = —0.3, a\ — g 2 — 0) (left 
panel), and (ft = —0.3, crj = 0, a 2 = 1, X = 20) (right panel), that is, parameters 
for which the trivial solutions are stable. The samples in the figure approach 0 as 
time increases in agreement with the stability condition in (7.95). O 

Example 7.43 Let X(t) be the solution of (7.91) with a 2 = 0, so that X(t) is a 
geometric Brownian motion. The moment of order r > 1 of X(t) | (X(0) = x) 
is = x' exp[( — fir + r(r — l)cr 1 2 /2))f] so that it is stable as t — > 00 if 

—fir + r(r — 1)ct 2 /2 < 0. Hence, the moments of order r = 1, 2 , and 3 are stable 
if yS > 0, P > o^/2, and f > o 2 , respectively. Note that ft must satisfy distinct 
conditions for the stability of moments of different orders and that moment stability 
provides limited if any information on sample stability, that is, the stability condition 
in (7.95). ❖ 

Theorem 7.23 IfC(t) in (7.91) is replaced by C n {t) = Y n ±- n = 1,2,..., 

where N n (t) are homogeneous Poisson processes with intensities X„ increasing 
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Fig. 7.29 Stability and 
instability regions for the 
trivial solution of (7.91) with 
a\ = 02 = 1 and 
Yi ~ U(—a, a), a = JJJI 



A 




t t 


Fig. 7.30 Samples X(t) for ( f) = — 0 . 3 , cri = 1 , <T 2 = 0 ) ( left panel ) and (fi = — 0 . 3 , = 0 , ff 2 = 

1 , A = 20) ( right panel) 


monotonically with n such that lim,,-^ X n = oo, and Y, u k are iid random variables 
uniformly distributed in [— a n , a n ], a„ = ^/3/A„, then C n {t ) converges weakly to 
a standard Brownian motion as n — > oo. The stability condition in (7.95) becomes 
ft > —{erf + (T-? )/2 as n — > oo. 

Proof Note that the processes C n ( t ) and B{t) are equal in the second moments sense 
since Y n \ ~ U (— V3/A„, *J3/X n ) by assumption, so that the mean and covariance 
functions of C n (t) are £[C„(f)] = Oand E[C n {s)C n {t)] = X n {s At)E[Y 2 J = sAt. 

We first show that, asymptotically as n — > oo, C n (t) becomes a version of 
a standard Brownian motion B(t). Since C„(t) and Bit) have stationary indepen- 
dent increments, it is sufficient to show that their marginal characteristic functions 
coincides as n — > oo, that is, lim ;i ^oo (pc n (t)(u) = ( PR(t)( u )> where (pc n (t)(u) = 
exp \f. n t{E[e luY " 1 ] — 1)] and < PB(t){u) = exp(— u 2 t /2). Since exponential are con- 
tinuous function, it is sufficient to show lim„_ >00 X„(E[e luYnl ] — 1) = —u 2 /2. The 
expectation of the Taylor expansion of exp(iuY, u \) about u = 0 is E[e'“ Yn] ] = 
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1 — ‘y E[Y 2 | ] + E[r(iuY n j)], where |r(v)| < v 2 a(|v|) and a : [0, oo) -» K is an 
increasing function such that lim,qo a(v) = 0. We have 

/ u 2 \ u 2 

X n (E[e ,uY ^}-\) = kA- —E[Y 2 l \ + E[r(iuY nA )]\ = + X n E[r(iuY nA )]. 

Since \X n E[r(iuY nA )~\\ < X n E[\(iuY nA ) 2 a(\iuY nA \)\\ < X n u 2 E[Y 2 l ]a(\uy/3/X n \) 
and a(\Uy/3/X n \) — > 0 as n — > oo, we have lim„_ j . 00 (pc n (t)( u ) = It is 

possible to extend the proof to distributions other than Y n A ~ U (—y/3/X n , ~J3/X n ). 

We now show that the sequence of compound Poisson processes { C n (f)} converges 
weakl y to a standard Brownian motion Bit) in [0, 1 ] . The consideration of time interval 
[0, 1] is not restrictive. Let C[0, 1] be the space of real-valued continuous function 
defined in [0,1]. This space with the uniform metric p(x, y) = sup 0<f<1 \x(t) — 
y(f) |, x,y e C[0, 1], is a separable complete metric space ([55], Appendix 1). 
Denote by ^ the Borel er-field generated by the family of open balls [y e C[0, 1] : 
p(x, y) < r}, r > 0, centered at x e C[0, 1], that is, the a -field generated by the 
uniform topology. 

Let D[0, 1 1 be the space of real-valued right continuous with left limit functions 
defined in [0,1], so that x(f+) = lim^ x(.y) = x(t) and x(t-) = lim s f, x(s) 
exists at all times for all x e D[0, 1]. It is assumed that v(0) = x(0+) for all 
members of Z)[0,1]. Let E be the set of continuous, strictly increasing functions 
f : [0, 1] — > [0, 1] such that f (0) = 0 and £(1) = 1. The function pd(x, >’) = 
infj e s | sup 0<r<1 | x(t) — y(f)| +sup 0<?<1 \t — |(f)|| is a metric on D[0,1], Let @ be 
the Borel a-field generated by the open balls [y e D[0, 1] : pz>(x, y) < r}, r > 0, 
centered at x e Z)[0, 1], that is, the cr -field generated by the Skorokhod topology. 
Note that (1) a sequence { x n e /)(0. 1], n = 1,2,...} converges to an element x e 
D[0, 1] if and only if there exists functions e E such that lim,,-^ x n (Ut)) = 
x(t) and limn-^oo ^„(f) = t both uniformly in t, (2) the Skorokhod convergence is 
weaker than the uniform convergence, and (3) Skorokhod topology relativized to 
C[0,1] coincides with the uniform topology. 

Since the samples of B and C„ are elements of C[0, 1] and /.)[(), 1] and C[0, 1] C 
D[0, 1], we need to prove the convergence of C n to B in D[0,1]. The measure W of B 
on (C[0, 1], c to) can be extended to (D[0, 1], $>) by setting W( A) = W(AflC[0, 1]) 
for A e St ([55], Sect. 16). The weak convergence C„ => B follows from Theorem 
15.1 in [55], stating that, if a family of probability measures [P,,] is tight and if 
I’n 1 / ; => Bjt^ 1 , holds whenever t\ , ...,tk all lie in Tp , then P„ => P. 
The set Tp consists of those times t <= [0, I] at which the projection function n t is 
continuous except at points forming a set of P-measure 0. 

The convergence P„n tl , k => P of the finite dimensional distributions 
of C„ to those of B corresponding to arbitrary t\, . . ., fit has already been shown. It 
remains to show that the sequence of processes [C,,] is tight. The sequence {C„} is 
tight if (1) for each positive p, there exists a such that P(sup 0<f<1 |C„(t)| > a) < 
p, n > 1, and (2) for each positives and?;, there exists 8 e (0, l)and«o e [1,2,...] 
such that P{W' n (8 ) > e) < p, n > no, where 
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W„(S) = sup{|C„(s / ) - C n (s") I : s', s" e[t,t + 5]}, 

W' n {&) = inf max W n [ti-i,ti), 

[ti] 0<i<r 

0 < to < ti < ■ ■ - < t r = 1 is such that f,- — f,-_i > <5, and x e Z)[0, 1], For a > 0 
arbitrary, we have 

P( sup |C„(.y)| > a) < P( sup C„(s) > a) + P( inf C n (s ) < —a) 

0 <s<t 0 <s<t 0<s<t 

— 2P( sup C„(s) > a ), 

0<5<? 


by the postulated symmetry of the density of C n . Let T n (a) = inf {r > 0 : C n (t) > fl} 
be the first time when C„ exceeds a and note that 

P(C„(t ) > a) = P(C n (t) > a | F„(a) < t)P(T n (a) < t ) 

since P(C„(t) > a \ T n (a) > t) = 0. The inequality C n (T n (a)) > a and the 
symmetry of the density of C n imply P(C n (t) > a \ T n (a) < t) > 1/2, so that 

P{ sup C n (s) >a) = P(T a <t)< 2 P(C n (t) > a), 

0 <s<t 

P( sup |c„(5)| > a) < 4 P(C n (t) >a) = 2P(\C n (t)\ > a), 

0<5<f 


, , , , E\C n (t) 2 \ 2 k n tE[Y 2 x ] 2 1 

sup |C„(s)| >a)< 2P(\C,m >a) < 2 L J = -~ 

0<j<r a z a- a z 


by Chebyshev’s inequality and properties of C n (t). Hence, for each t; > 0 there 
exists a = (2 t/ij) 1 ^ 2 satisfying the first condition. 

For the second condition, note that W n [T n ^~ t, T n .k) = 0 in the time intervals 
between consecutive jumps of C„(t), so that W,' (<5) = maxi<t<^v n q){|T ;i! t|}. The 
distribution of W^(<5) is P(W' n {8) < y) = (2F„(y)— l) q P(N(t) = q), so that 

P{W'(8)>s) = 1— exp[ — 21„t(l — F n (e))], e > 0, where h), denotes thedistrih- 
ution of Y n i . The required condition is satisfied for rj — I —exp \—2), n t (1 — f„(e))| 
since X n (1 — F„(s)) — >■ Oasn —> oo. Hence, the sequence of processes {C„} is tight. 

Since the finite dimensional distributions of {C„} converge to those of B and the 
sequence of random elements {C„} is tight, we conclude that { C „ } converges weakly 
to B. A 

According to Theorem 7.23, the stability of the trivial solution of (7.9 1 ) as X oo 
coincides with that for the trivial solution of dX(t) = —/3X(t)dt + eri X(t)dB(t) + 
G 2 X(t)dB\(t), where Bit) and B\(t) are independent standard Brownian motions. 
The trivial solution of the latter equation is stable if /3 > —(erf + or-r)/2, that is, 

> — 1 for a\ = 02 = 1 , a result that is in agreement with the plot in Fig. 7. 29, that 
seems to approach — 1 as X increases. 
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Let X(t ) be the solution of (7.91) in which the driving noise processes B(t) and 
C(t) are replaced by an a-stable Levy process L a (t ), that is, 


dX(t) = -pX(t-)dt+(rX(t-)dL a (.t), t > 0, 


(7.96) 


where ft, a > 0 are constants. An approximate solution of this equation can be 
obtained by replacing L a it) with 


L a , a (t) = a{a,a)B{t) + C aM (t), 1 > 0, 


(7.97) 


based on the fact that any a-stable process can be viewed as a sum of two indepen- 
dent processes, a scaled Brownian motion and a compound Poisson process C a , a 
corresponding to the jumps of L a a with magnitude larger than an arbitrary constant 
a > 0. We have seen in Sect. 3. 7. 6. 3 that the accuracy of the representation in (7.97) 
is remarkable. The differential equation forX(f) in (7.96) with L aa (t) in place of 
L a (t ) becomes 

dX(t) = —fiX(t—) dt + <rX(r— )(er(a, a) dB{t) + dC a . a it)), t > 0, (7.98) 

that is, a stochastic differential equation of the type in (7.91). Theorem 7.22 cannot 
be applied to study the stability of the trivial solution of this equation since the jumps 
of C a ,a(t) do not satisfy the conditions of this theorem. 

Example 7.44 Let X{t) be the solution of dX{t) = —/3X(t—) dt + oX (t—) dL a (t), 
t > 0, where f J >, a > 0 are constants and L a (t) is an a-stable process whose 
increments L a (t) — L a (s ) ~ S a ((t — s) 1 ' 01 ,0,0), t > s, are a-stable random 
variables with scale it — ■s) 1 ' / “, skewness 0, and location 0. If a e (1,2], the mean of 
L a (t ) — L u (s) exists and is 0 ([56], Property 1.2.19), so that the expectation of the 
defining equation for X(t) gives /1(f) = — /S/x (?) with the notation pit) = E[Xit)~\ 
since X(f— ) and dL a it) are independent and E[dL a it)] = 0. This gives pit) = 
/r(0)exp(— fit) so that pit) 0 as t — »■ oo, that is, the first moment of A(f) is 
stable. <> 

Example 7.45 Suppose X(t) is a real-valued stochastic process defined by 

Xit) + 2pXit) + [\+o l W l it)iXit) = a 1 W 2 it), t > 0, (7.99) 

where /J > 0, <ri , and ai are real constants, and W i and Wi denote independent 
Gaussian white noise processes. The moments pip, q\ t) = E[X\it) p X 2 it) q ] of 
the bivariate process Z(f) = iX\ (f) = X(f), X 2 it) = Xit)) satisfy the differential 
equations 

pip , q; t) — ppip — 1, q + 1; t) — qpip +\,q — \\t) — Iqfipip, q\ t) 



with the convention pip, q\ t) — 0 if p < 0 and/or q < 0. This is a closed system of 
equations, so that moments of any order of X(t) and X it) can be calculated exactly. 
For example, the moments of order p + q — 1 and p + q = 2 result from 
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/x(l, 0; t ) — p,( 0, 1; t), 

A( 0, 1; f) = -n(\, 0; f) - 2fip(0, 1; r) 
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and 


A( 2,0; f) = 2/x ( 1 , 1; t), 

/i(l, 1; f) = /u.(0, 2; t) — /z(2, 0; /) — 2yS/x(l , 1; f), 

/7(0, 2; f) = — 2/x(l, 1; f) — 4yS/x(0, 2; f) + afp(2, 0; f) + er^ ■ 


Differential equations of the type in (7.99) provide useful models in applications. 
For example, the deflection V (x, t) of a simply supported beam with constant stiff- 
ness x and length / > 0 subjected to a fluctuating axial force Wit) applied at its ends 
satisfies the differential equation 


3 4 V(x,t) 


d 2 V(x,t) 3 2 V(x,t) 

— W (t ) — m- 


dx~ 


3 1 2 


where m is the beam mass per unit length. For the first buckling mode we have 
V(x,t) = X (t) sin (nx/l) so that 

X(t) + v 2 (1 - W(t)/p cr ) X(t) = 0, 


where v 2 = 7T 4 x / ( w/ 4 ) and p CI = n 2 x / 1 2 is the first buckling load for the beam. O 

Proof Since Wi ( t ) and Wi(t) are formal derivatives of standard Brownian processes 
B[(t) and Ihjt). Z(t ) is a diffusion process defined by the stochastic differential 
equation 


[ dXft) = X 2 (t)dt, 

[ dX 2 (t) = -[Xi {t) + 2 0 X 2 (t)] dt - (j[ Xi (f) dB { f) + ct 2 dB 2 (t). 

The moment equations result from Ito’s formula applied to the mapping ( X i X 2 ) m- 
X p { X‘l by averaging. 

Since moment equations for fixed p + q have the form m(t) = am(f), the 
moments of order p + q of Z(t) become time invariant as t — > oo if the eigenvalues 
of a have negative real parts. For example, the eigenvalue of a for moments of order 
p + q = 1 are ± J f> 2 - 1 so that, if ft > 0, X(t) and X(t) are asymptotically 

stable in the mean. A 


7.5.2 Noise Induced Transitions 

Let x(t), t > 0. denote the average population per unit area or any other scalar 
measure for the size of a biological population. According to the Verhulst model, 
x(t) satisfies the differential equation 
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x(t) =px(t)-x(t) 2 , t> 0, (7.100) 

where px(t), p e R, is the rate of population increase/decrease and —x(t) 2 relates 
to available resources. Positive and negative values of p correspond to favorable and 
hostile environments. The solution, 

x(t) = x(0)e p, [l + x(0)(e pt - l)/p]“’, (7.101) 

of (7.100), is asymptotically stable for p ^ 0 since x s (t) — lim^oo x(t) = 0 for 
p < 0 and x s (t) = p for p > 0. The system undergoes a phase transition at p — 0 
from x s (t ) = 0 for p < 0 to x s (t ) = p for p > 0. 

Consider an extended version of the Verhulst model in which p is replaced with 
p + odB(t)/dt + ydC(t)/dt, where a, y are constants, B(t) is a standard Brownian 
motion, and C(t ) is a compound Poisson process defined by (7.92) with jumps Yk that 
have mean 0 and finite variance. Note that the expectation of the random environment 
described by p + adB(t)/dt + ydC{t)/dt is equal to p. For this environment, the 
population size X(t) satisfies the stochastic differential equation 

dX(t) — (pX(t— ) — X(t— ) 2 )dt + X(t— )((idB(t) + ydCit)'), t> 0. (7.102) 


Theorem 7.24 The characteristic function q>{u\ t) = E[exp(iuX(t ))] of X(t) in 
(7.102) is the solution of the integro-differential equation 


dtp(u\ t) 
dt~ 


— pu- 


+ 1 


3 (p(u\ t) 


d 2 tp(u; t ) cr 2 u 2 d 2 tp(u; t ) 


3 u 3 u 2 2 3 u 2 

/ (p{u{\ + yy)\t)dF{y)-(p{u\t) 


(7.103) 


where F(y ) denotes the distribution of the jumps of C(t). 

Proof Arguments similar to those in [4] (Sect. 9.4.3) are used to establish (7.103). 
Ito’s formula applied to the mapping X (t) i->- exp (iuX(t)) gives 

e iuX(t) - e iuX(0) = [ iue iuX{s ~ ) dX(s) + - [ (i \u) 2 e iuX{ - s ~ ) d[_X, X] c (i) 

io+ 2 70+ 

+ X \e iuX{s) -e iuX(s ~ ) -iuf^^AXis)] 

0<s<t ^ ^ 

= J iue iuX(s ~ ) ^(pX(s-) - X(s-) 2 )ds + oX(s-)dB(s)^j 


0 iu) z e 


2JuX(s-)2 


X(s-) 2 ds+ ^ [e iu 


X(s) _ JuX(s - )i 


since the increment of the continuous part of the quadratic variation of X(s) is 
d[X, 2f] e (s) = cr 2 X(s—) 2 ds and 
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[ ( iu)e iuX(s ~ ) ydC(s ) = 

J o+ 


Y iue iuX(s ~ ) AX(s). 

0 <s<t 


The expectation of the above equation can be calculated term by term and gives 


( p(u ; t) — (p{u\ 0) = iuE 


(iue iuX ^ s ~^ (pX(s—) - X(s-) 2 )ds 


L 70+ 


aw 

~Y 


E £ e fuX(s - ) X(s-) 2 ds +E Y (e luX<s) - e iuX(s ~^ 


The brst three terms on the right side of (7. 103) result from the first two expectations 
in the previous formula by applying Fubini’s theorem and using properties of the 
characteristic function. The fourth term on the right side of (7.103) follows from 


Y l e iuX(s) - e il,x{s - ] ) | = E\E 

0<s<t ' 


= Yy e ‘ uX(Tk) 


JuX (T k —) 


N(t ) 


X ( 7)- ) = X(Tk — )(1 + yYk), the independence of from A ( — ) , the property 
P(N(t + At) — N(t) > 1) ~ XAt of N(t) that holds for XAt 1, and the fact that, 
conditional on N(t), the jump times 7)- are independent random variables distributed 
uniformly in (0, t). A 

Example 7.46 Consider the special case of (7.102) with y = 0. so that X(t) is a 
diffusion process defined by dX(t) = (pX(t)— X(t) 2 )dt+cr X (t)dB(t). Then (7. 103) 
becomes a partial differential equations for <p(u: t). Its Fourier transform (Theorem 
7.4) constitutes the Fokker-Planck equation for the density f{x\ t) ofX(t) \ ( X ( 0 ) = 
x), and admits the stationary solutions f s (x) = y Sjr 2 ^/ cr ““ 1 ’exp(— 2x/a 2 ), x > 0, 
forp > er 2 /2and f s (x) — 8(x)forp < <t 2 /2, where > 0 is a constant. Figure 7.31 
shows densities f s (x) for a = 1, p < cr 2 /2, cr 2 /2 < p < a 2 , and p > a 2 . The 
stationary density changes qualitatively at p = a 2 / 2 and p = a 2 . Since similar 
changes in f s (x) result if p is kept constant and noise intensity a is varied, the 
qualitative changes in f s (x ) are referred to as noise induced transitions. <> 

If a = 0, (7.103) takes the form 


dip(u; t) dip(u; t) . d 2 (p(u; t ) 

— — pu — — T i u — — — ^ — -f X 


du 


du~ 


IL 


<p(u( 1 + yy); t)dF(y) - <p(u\ t)j, 
(7.104) 


and cannot be solved analytically. The integro-differential equation (7.104) can be 
used to find X(t) for the Gaussian random environment p + ydB(t) /dt by increasing 
indefinitely the arrival rate of the jumps of C(f) and simultaneously reducing their 
size, as shown by the following theorem. 

Theorem 7.25 If E[Y\] = 0, XE[Y 2 ~\ = 1, P(|Ti| < a) = 1, and a > 0 is such 
that a — > 0 as X — > oo, then (7.104) becomes 
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Fig. 7.31 Stationary density 
f s (x) for a = 1, p = 

0 (p < cr 2 /2), p = 

3/4 (ct 2 / 2 < p < o' 2 ), and 

p = 2(p > o' 2 ) 





3 <p(m; f) d(p(w, t) 

pu- 


d 2 ip(u\ t) y 2 u~ d 2 (p(u; t) 


3t 


du 


du 2 


du 2 


(7.105) 


as A — >■ oo, f/iaf is, X(f) is f/?e solution of the Verhulst model with Gaussian rate 
p + ydB(t)/dt. 

Proof The jumps ~ U(—a,a), a = s/3/X, have the required properties. For 
these jumps, the term in square brackets of (7.104) is 


A / ip(u(l + yy);t)dF(y) - ip{u 
Ui 

= X 




,) + (uyyY-ttfXt + + r(uyy) IdFly) - 


du 


du 2 


)jdF(y) - <p(u\ f)j 


[ dip{u\t) (uy ) 2 d 2 ip(u\t) 2 f ] 

\i‘Y — ^ E[Y i]+ — — E\Y ] 1 + / r(uyy)dF(y)\ 

L du 2 du- Jr J 

J 

Jr 


= X\uy 

(uy) 2 d 2 ip(u; t ) 


+ A / r(uyy)dF(y), 


where tp(u{ 1 + yy ); t ) has been expanded in Taylor’s series about u, \r(uyy)\ < 
(Myy) 2 a(|Myy|), and a : [0, oo) [0, oo) is a monotonically increasing function 
such that lim^o a(£) = 0. Since 


A / r(uyy)dF(y ) 


< A / |r(Myy)|r/F(y) < A j (uyy) 2 a(\yu\)dF(y) 
J M J M 


Xu 2 y~a(a) / y~dF(y) = u 2 ya(a) — ► 0, as a — > 0 (A — » oo), 


7.5 Applications 


315 



0 0.5 1 1.5 2 2.5 3 3.5 4 

X 


0.6 


0.5 



X 


Fig. 7.32 Histograms and marginal densities of X(t ) at t = 10 for X = 12, p = 1, a = 0, and 
y = 1 ( left panel) and X = 12, p = 2, cr = 0, and y = 1 (right panel) 


we have 


lim X 

A— >oo 

which yields (7.105). A 

Example 7.47 Let X(t) be the solution of (7.102) with a = 0, so that it satisfies the 
stochastic differential equation dX(t ) = {pX{t— ) — X(t—) 2 )dt + yX(t—)dC(t). 
The heavy solid lines in Fig. 7. 32 are Fourier transforms f s ( x ) of stationary solutions 
(Psip) = lim^oo i£>(m; t) of (7.104) for X =12, p — 1, a = 0, and y = 1 (left 
panel), and X = 12, p = 2, a = 0, and y = 1 (right panel). The stationary 
characteristic functions <p s (u) are numerical solutions of (7.104) for these parameter 
values. The histograms have been constructed from 1000 independent samples of 
X(t) at t = 10 defined by (7.102) with a = 0 and y = 1. They follow closely the 
densities f s (x) and capture the phase transition phenomenon exhibited by j\(x). 
The samples of X(t) have been generated by the fixed time step integration scheme 
in [57] using 1000 equal time steps in [0, 10]. Time t — 10 has been selected to 
assure that X(t) has essentially reached stationarity. 

The heavy solid lines in Figs. 7.33 and 7.34 are the stationary densities of X(t) in 
(7.102)forp = 3/4andp = 2, respectively. Both densities are for a = landy = 0, 
so that they correspond to X(t) under Gaussian white noise, and have the expression 
f s (x) = y Sx 2( ^ a_ “ 1) exp(-2A7CT 2 ), x > 0, since p > a 2 / 2. The histograms have 
been obtained from 1000 independent samples of X(t) at f = 10 defined by (7.102) 
with a = 0 and y = 1 , that is, X(t) under Poisson white noise. The histograms in 
Fig. 7. 33 are for X = 5 and p = 3/4 (left panel) and X = 30 and p = 3/4 (right 
panel). The histograms in Fig. 7. 34 are for 1 = 5 and p = 2 (left panel) and X = 30 
and p = 2 (right panel). The fixed time step scheme in [57] with 1000 steps in 
[0,10] has been used to generate samples of X(t). The histograms are consistent with 


<p(u( 1 + yy); t)dF(y) - <p(u\ t) 


(, uy)~ 3 </?(m; t) 
~2 du 2 ’ 
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x x 

Fig. 7.33 Histograms of X(t) at t = 10 for X = 5, p = 3/4, a = 0, and y = 1 (left panel), and 
X = 30, p = 3/4, a = 0, and y = 1 (right panel). Solid lines in both panels are the stationary 
density of X(t ) for p = 3/4, a = 1 , and y = 0 




a; x 

Fig. 7.34 Histograms of X(t) at t = 10 for k = 5, p = 2, a = 0, and y = 1 (left panel), and 
X = 30, p = 2, a = 0, and y = 1 (right panel). Solid lines in both panels are the stationary 
density of X(t) for p = 2, a = 1, and y = 0 


Theorem 7.25 in the sense that they differ from / v (x ) for a = 5 but follow closely 
this density for X — 30. O 

Theorem 7.26 //'lim|„|_ ! , 00 u<p s (ii) = 0, lim| H |_ >00 utp' s (u) = 0, and (7 .10 4) admits 
a unique stationary solution (p s {u ) = lim^oo <p(w; t), then the Fourier transform of 
this equation, that is, the Fokker-Planck equation for the stationary density f s (x) of 
X(t) has the form 

-Ui PX - x 2 )f s {x)) +X I — - — /s(-j— - — )dF(y) - Xf s (x) = 0, (7.106) 
dx J l + yy \ + yy 

where F denotes the distribution ofY{. 
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Proof Calculations similar to those used for Theorem 7.4 give (7. 106). For example, 
the first term on the right side of the stationary version of (7.104) multiplied by 
exp(— iux)/(2n) and integrated over the real line gives 


h] pue 


~ lux (p[(u)du = 

Ztt 


ue lux (p s (u) | ™oo ~ j lux )<Ps(u)du 


_P_ 

2 Tt 


e ,ux (p s (u)du — ix / e mx u(p s {u)du 


= ~pfs(x)~ pxf'(x) 


using integration by parts and postulated boundary conditions for < p s (w) . The integral 
term on the right side of (7.104) yields 

j <Ps ^ U( - 1 + Yy ^ dF( ' y ^ ~\ dU 

= x [ e- ivx l (l+ yy\ s {v)-^-'\dF{y) = X [ ^— fs (—?—)dF(y) 

J L 2n J i + yyj J i + yy l + yy' 

by the change of variables v = n(l + yy) and Fubini’s theorem. A 


7.5.3 Solution of Uncertain Dynamic Systems by SROMs 

Methods for solving stochastic differential equations with random coefficients have 
been discussed in Sects. 7.3 and 7.4. In this section, we extend our considerations 
in Sect. 7.4.6 on the solution of this class of equations by stochastic reduced order 
models (SROMs). Numerical examples are presented to illustrate the construction of 
SROM-based solutions and assess their accuracy. Bounds are first established on the 
discrepancy between solutions of deterministic equations that have the same func- 
tional form but different coefficients. The bounds are then extended to characterize 
the accuracy of SROM-based solutions. 

The solution of SDEs by extended stochastic reduced order models (ESROMs) 
and methods for solving SDEs by using these models are discussed in Sect.A.3. 
ESROM-based solutions for stochastic algebraic and partial differential equations 
are presented in the following two chapters. 


7.5.3.1 Bounds on Discrepancy Between Solutions of Deterministic Systems 

Let X(t) and X (t) be -valued functions defined by 

X(t) = f(t, X(t)), t > 0, 

X(t) = f(t,X(f)), t> 0, (7.107) 

where /, / : [0, oo) x — »■ M. d are such that these equations have unique solutions. 

We establish bounds on the discrepancy between X(t) and X (t). 
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Some notations are introduced prior to constructing these bounds. The logarithmic 
norm of an (d, r/)-matrix Q with real-valued entries is 

II I + pQ || -1 

fi(Q) = lim - — " , (7.108) 

Pfo p 

where I is the identity matrix and || - || is a subordinate matrix norm. If || • || is the 
matrix norm subordinated to the Euclidean norm, then /x((2) is the largest eigenvalue 
of matrix ( Q + Q')/2 ([58], Sect. 1. 10). Let /x(3 fit, Z(t))/dX) be the logarithmic 
norm of the (d,d)- matrix 3 fit, Z(t))/dX = [3/„(f, Z(t))/dz v , u, v = 1, . . ., d}, 
where Z(t) denotes a vector with coordinates in R(t) = x.f =l [Xjit) AXj(t), Xj(t)v 
Xi (f)] c that exists by the mean value theorem ([59], Sect. 20). Let l{t) be a 
real-valued function of time such that /p(3/(f, Z(t))/dX) < lit), and let 

3(0=|| f{t, X(t)) - f(t, X(t)) || (7.109) 

be a measure of the difference between functions / and /on the solution X(t). 

Theorem 7.27 The discrepancy between the solutions of the differential equations 
in (7.107) can be bounded by 

II X(t)-X(t) ||< e i(?) (|| 

where Lit) = £(s)ds and || ■ 

Proof IfL(0 < Oat all times, then the right side of (7. 110) can be used as a measure 
of the discrepancy between X(t) and X(t) provided 3(0 is bounded. If X it ) and X(t) 
coincide at the initial time, then (7.110) becomes 


X(0) -X(0)\\+J e- L(s) 8(s)ds^j, (7.110) 


denotes the Euclidean norm. 


|| X(t) - X(t) ||< e L(,) [ e~ L{s) 8(s)ds. (7.111) 

Jo 

The measure m(t ) =|| X(t) — X(t) || of the discrepancy between X(t ) andZ(t) 
calculated at a later time t + At, At > 0, satisfies the following inequality 

m(t + At) =|| X(t + At) - X(t + At) || 

<|| X(t) + X(t)At - X(t) - X(t)At || +0(Ar) 2 

<11 X(t) - X(t) + At (J{t, Xit)) - fit, X(0)) || +0(At) 2 

<11 Xft) - Xit) + At (fit, Xit)) - fit, Xf))) || 

+ At || fit, Xf)) - fit, Xf)) || +OiAt ) 2 
, dfit,Zit)) 


max 
Z(t)eR(t ) 


3X 


/ + 


m(t) + AtSf) + OiAt) 


3 fit, Zf)) 


dX 


mit) + At8f) + 0(Atf 
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for a small time step At > 0. An alternative form of this inequality is 


m(t + At) — m(t ) 
At 


U 


At \ Z(t)eR(t) 


df(t, Z(f)) 


8X 


1 ]m(t) + <5(0 + O(At). 


This inequality and Theorem 10.6 in [58] yield (7.110). ▲ 

Theorem 7.28 If the differential equations (7.107) are linear, that is, they have the 
form 


X(t) = A(t)X(t) + Y (t), t > 0, 

X (t) = A(t)X(t) + Y(t), t > 0, (7.112) 

with initial states A(0) = Xq and A(0) = Xq, then 

II X{t) - X(t) || < e L(t> { || X 0 -X 0 \\+J^ e~ L(s) S(s)dsJ, (7.113) 

where 

L(t) = f X. max (s)ds; 

Jo 

Va x(0 = the largest eigenvalue of[A(t ) + A(t)') /2; and 

m =11 (A(f) - m)X{t) + ( Y(t ) - Y(t)) II . (7.114) 

Proof Theorem 7.27 and the definition of the logarithmic norm yield the inequality 
in (7.1 13). If A is time invariant, then 

II X(t) - X(t) || < e ^'( || z 0 - A 0 || e~ x ™* s 8(s)ds), (7.115) 

where 7. max is the largest eigenvalue of (A + A')/2. If 7. nlax < 0 and there exists 
a constant M* >0 such that 8(t)<M* at all times, then || X(t) — X{t) || < 
|| Ao — Ao || — M* /X max at all times. If A max > 0, the resulting bound is less useful 
for our objective since it increases exponentially in time resembling the behavior of 
bounds derived from Gronwall’s inequality ([60], Sect. 10.2). ▲ 

Example 7.48 Let A (r) and A(r) be real-valued functions defined by (7.112) with 
d_= 1, A(t) = -1 + 0.3 sin(f), Y(t) = 1, A = -1, Y(t) = 1, and A(0) = 
A(0) = 0. We have X max (t) = Aft), so that £(t) — A(t) and L{t) = J Q r A(s)ds = 
— r+0.3(l — cos(t)). Since A(r) = 1— exp(— t), wehaveS(t) = |(A(r) — A)A(t)| = 
|0.3 sin(r) (1 — exp(— r))|. The solid and dotted lines in Fig. 7. 35 (left panel) are the 
solutions A(f) and X (t). The solid and dotted lines in Fig. 7. 35 (right panel) are the 
actual discrepancy |A(t) — A(f)| and the bound on | A (r) — A(f)| in (7.110). The 
bound in is remarkably tight. <> 
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f 



t 


Fig. 7.35 Solutions X(t) and X(t) in solid and dotted lines ( left panel), and \X{t) — X(f)| and a 
bound on |X(f) — X(t)\ in solid and dotted lines ( right panel) 


7.5.3.2 SROM-Based Solutions 


Suppose matrix A{t) and/or input Y(t) in (7.112) are random and that A(t) = A 
is time invariant. Let J"bea random element collecting the random entries in the 
defining equation of X(t), and assume that .S' can be characterized by n independent 
samples (zi, . . ., z„). Let (zi , . . .,z m ) and (p \, . . ., p m ), m n, be the defining 
parameters for a SROM S’ of .S’. Denote by { X , , A , , Y , } and { Xj, Ak, 1a } samples 
of ( X , A, Y) corresponding to {z,} and {za}- The discrepancy between X; and Xk 
can be bounded by (Theorem 7.28) 

\\X k (t)-Xi(t)\\<e^' f e-' A ‘^ s S kj (s)ds, (7.116) 

Jo 

where ^a, /( f) = II (A/ - A k )X k (t) + (T/(t) - ?a( 0)II andA, !max is the largest eigen- 
value of (A[ + A'^/2. 

The inequalities in (7. 1 16) can be used to construct bounds on moments of || (f) — 

A - (/) || . Let /-r be a measurable function defined by a partition {^/:}, k = 1, ... ,m , of 
(zi, . . ., z n ) such that h(n ) = za for Zi e ^a and pk — rik/n, where «a denotes the 
cardinality of (Sect. A. 3). Accordingly, the moment of order q of the discrepancy 
||X(f) — X(f)|| can be bounded by 


E[ ||X(f)-X(f)|| ? ] 



(7.117) 

Similar bounds can be developed for other moments of \\X(t) — X (f )|| . 

Numerical illustrations in the remainder of this section are based on developments 
in [61], and include dynamic systems with random coefficients and/or input. 
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Example 7.49 Let X(t), t > 0, be defined by X(t) = — XX(t) + Y (t ) 2 , where 
Y(t) = —r]Y(t) + (2 r]) x / 2 W(t),\,ri > 0, and W(t ) is a Gaussian white noise 
viewed as the formal derivative of a Brownian motion B(t). Then (X, Y) is a diffusion 
process satisfying the stochastic differential equation 


cl 


~X(t) 


' -XX(t) + Y(t) 2 ' 

Y{t) 




0 

(2 


dB(t). 


(7.118) 


It is assumed that Y (0) is independent of B and that Y (0) ~ N (0. 1), so that Y(t) has 
a stationary start. 

The linear random vibration theory can be applied to calculate second moment 
properties of X{t) defined by X(t) = —XX it) + Y (t) 2 since the mean and covariance 
functions of Z(t) = Y (t) 2 can be calculated. For example, the stationary first two 
moments of X(t) are fi(l, 0) = l/X and pc(2, 0) = (3A. + 2 rf)/[X 2 {X + 2 rj)], so that 
its stationary variance is a 2 x = 2/[X(X + 2 rj)] (Examples 7.3 and 7.4). 

SROMs can be used to calculate approximately properties of X(t). The following 
illustration is for X — ij = 1 and a SROM Z(t ) with m = 10 samples consisting 
of samples of Z(t) = Y (t) 2 generated in [0,4]. Figure7.36 shows the samples of 
Z(t) in the time interval [1,4] in which Z(t) is assumed to be stationarity. The prob- 
abilities of these samples are p\ = 0.0043, pi = 0.0110, pj = 0.1821, p 4 = 
0.2222, p 5 = 0.0391, p 6 = 0.0091, p 7 = 0.3076, p 8 = 0.0892, p 9 = 0.1353, and 
p l0 = 0.0001. The absolute value of the errors for the first six moments of Z(t) is 
under 2%. The solid and dotted lines in Fig. 7.37 (left panel) are the scaled covari- 
ance function of Z(t) and the scaled target covariance function c:-(r)/c-(0), where 
c z ( r) = 2exp(— 2?j|r |) is the covariance function of Z(f) = Y (t) 2 . Figure 7.37 
(right panel) shows estimates of c z (r)/c z (0) obtained from sets of ten independent 
samples of Z(f). In contrast to the SROM-based approximation for c z (t) / c z (0) with 
m = 10 which is accurate, Monte Carlo estimates of c z (t) / c z (0) are unstable and 
can have large errors. 

The absolute value of the errors for the first six moments of X(t) are 2.80, 2.96, 
9.04, 16.02, 24.92, and 35.57 in percentages. These errors are calculated with respect 
to the exact solution of the moment equations for X(t). While these errors are larger 
than those for Z(f), the performance of X (t) is satisfactory given that this process 
has only m = 10 samples. The solid and dotted lines in Fig. 7. 38 (left panel) are the 
scaled covariance function of X(t) and the target covariance function c x (r)/c A (0). 
Figure 7.38 (right panel) shows estimates of c^(r)/c A -(0) obtained from sets of ten 
independent samples of Z{t). The behavior of these estimates is similar to that of the 
estimates in Fig. 7. 37 (right panel). 

The accuracy of SROM-based solutions can be improved by increasing model 
size. For example, a SROM X(t ) of X(t) corresponding to Z(t) with m = 14 is 
superior to X(t) consider in previous figures. The covariance functions of X (t) with 
m = 14 and m = 10 are similar, but higher order moments of X (t) with m — 14 
are significantly more accurate than those of X(t) with in = 10. The absolute value 
of the errors of the first six moments of X(t) with m = 14 are under 7.5%. 

Temporal averages for properties of X ( t ) have been used in the previous evalua- 
tions for the following two reasons. First, simple processes, that is, processes having 
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Fig. 7.36 Samples of SROM 
Z(t), m = 10, of Z(f) 
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£ 



Fig. 7.37 Scaled covariance functions of Z(t), in = 10, and Z(t) in solid and dotted lines ( left 
panel) and Monte Carlo estimates of c z (r)/c z (0) for sets of ten samples of Z(t) (right panel) 


a finite number of outcomes, cannot be stationary. This is consistent with the fact that 
estimates of stationary processes based on finite number of samples are not invariant 
to time shift. Second, temporal averages of statistics of simple processes are anal- 
ogous to the assumption of random start. For example, the temporal average of the 
mean of Z(t) in a time interval [to, To + r] is 

2 rz 0+r _ 2 rr 0+r m ( \ ( x \ 

- / E[Z(t)]dt = - I V pkZk(t)dt = y] Pk \ ~ / Zk(ro + u)du ) 

X Jr 0 r Ao “ \ r Jo ) 

where the latter integral can be interpreted as the expectation of Zk corresponding to 
a random uniformly distributed start in (0, r). <> 

Example 7.50 System responses X(t) — X oe~ Xt + Jq e~ x ^ r ~ s) Z(s)ds and X(t) = 
Xoe~ Xt + Jq Z{s)ds can be used directly to bound the discrepancy | X(t) — 

X (t) | between the processes in the previous example. The bound, 
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r r 

Fig. 7.38 Scaled covariance functions of X(t ), m = 10 , and X (t) in solid and dotted lines ( left 
panel) and Monte Carlo estimates of c x (z)/c x (0) for sets of ten samples of X(t) ( right panel ) 


| X(t) - X(f)| < e~ kt [ e~ ls \ Z(s) - Z(s)\ds, 

Jo 

derived from the expressions of X{t) and X (t) coincides with that in (7.1 15). Note 
also that 

E[\X(t) - X(f)|] < [ e Xs E[\Z(s) - Z(s)\]ds < ^(l - e~ lr ) 

Jo A 

at any time t e [0, r], where M = maxo<r< T E[\Z(t) — Z(f)|]. If Z{t) is represented 
by n independent samples that are grouped in m clusters k = I , centered 
on the samples zjt(t) of Z(t), then 


1 

E\\Z(t) - Z(t) |] \zk(t) - Zi(t)\. 

n k=l Zi <E% 

The graph of Z?[|Z(f) — Z(r)|] in Fig. 7. 39 is based on n = 500 independent samples 
of Z(t). The corresponding constant Mis approximately equal to 1.78 for r = 4 and 
the SROM Z{t) of Z(f) with m = 10 described in Figs. 7. 36 and 7.37. <> 

Example 7.51 Consider the stochastic differential equation 

dX(t) = -AX(t)dt + dB(t), t > 0, (7.119) 

where Bit) is a standard Brownian motion and A > 0 is a random variable that is 
independent of B(t). Let A he a SROM of A and denoted by X(t) the solution of 
(7.1 19) with A replaced by A. We develop bounds on the discrepancy between X (?) 
and X(t ) using the fact that X(t) \ A is a Gaussian process. 

Numerical results are for A ~ U(l, 2), a SROM A for A with m = 5 samples, 
and X(0) = 1. The errors between the first six moments of A and A are under 2%. 
The solid and dotted lines in Fig. 7. 40 (left panel) are the variance functions of X(t) 
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and X(t), respectively. Figure 7.40 (right panel) shows Monte Carlo estimates of the 
variance of X(t) based on sets of five independent samples of A . The Monte Carlo 
estimates show a significant sample to sample variation. The plots in Fig. 7.41 are 
similar to those in Fig. 7. 40 but for central moments of order 4 of X(t ) with m = 5 
and X(t). These moments have the same behavior as the variances in Fig. 7. 40. Monte 
Carlo estimates based on sets of five independent samples of A are unstable, and can 
be inaccurate. 

The discrepancy between X(t ) and X(t) can be measured by, for example, 

1 m 

E[\X(t ) - X(t)\] < - V V e~ Xit \X k - Xi\ 
n z — 4 A — 4 

k = 1 k t e'tfu 


e kiS E[\X k (s)\]ds, (7.120) 


where (Ai, . . ., A„) are independent samples of A and (Xk, pk ), k = 1, . . ., m, are 
the defining parameters for A. The sets {+).. k = I . ...,/«} partition (7. | , . . ., A„) 
such that pk — rik/n, where rik denotes the cardinality of +).. Since the Gaussian 
variable Xk(t) has mean 0 and variance a>(0 2 = (l — exp(— 2 Xkt)) /(2A*), we have 
E[\X k (t)\\ = y/2/TTGkit) and 


E\X(t)-X(t)\ < 



£ 2 ^kt ^ kit 

A/ — 2Xk 


) 


(7.121) 

Similar considerations apply to the differential equations for the first two moments 
of X(t) andX(f), that is, the means /i; (t) = —XiPi(t), Jik(t) = —Xkjik(t), the vari- 
ances yi(t) = —2 Xiyi(t)+ 1, Yktt) — —2 XkYk(t)+ 1, and the covariance functions 
3 c,(i,r)/3t = —XjCj(s,t), and dck(s, t)/dt = —XkCk(s,t), s < t, equations, 
where pi ; Jlk, Yi- Yk , and c, ;cjt denote the mean, the variance, and the covari- 
ance functions corresponding to A set equal to A.,-; Xk- For example, the discrepancy 
between the mean functions Jlk and m is such that 


l/ijfcW - /4, (f)l < I MO I sign(A A . - Xi)(e Xit - e Xk ’) 


by the bound in (7.115), where po denote the mean value of the initial state MO). 
The expectation of the discrepancy between the random functions Jl(t) and pit) 
with outcomes Jlkit) of probabilities pk, k — 1 , . ... m, and //, (t) of probability 
1 /h, i = respectively, can be obtained from 


E[m) - Hit) |] sign(A 4 - X i )(e~ x ‘ t - c ~ Xkt ). 

n k=\k t ^ k 


Numerical results are for Xk = 0.9 + 0.2 k, pk = 1 / m . k = 1, . . ., m, and m = 5. 
Let Ik = [1+0. 2(£— 1), 1+0. 2/c) be a partition of the range [ 1, 2] of A and/? : R -+ K 
a measurable function such that h(Jk) = Xk and P(A e Ik) — pk, k — 1, . . ., m. 
If A ~ 17(1, 2) is described by its distribution function rather than independent 
samples, we have 
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Fig.7.39 An estimate of 
E[\Z(t) - Z(t ) |] fora 
SROM Z(l) with m = 10 



t 


E\\n(t ) - flit ) |] < | Mol T / sign(X k - X)(e Xit - e x, )dX 
k= 1 


lw>l 

mt 


,,, . 

X ‘ 

k= 1 v 


— f l+0.2(Ar — l))r + e -(l+0.2«r _ 


— 2e~ 


Figure 7.42 shows this upper bound on E\\n(t) — /x(f)l] in [0,4] for |/rol = 1-0 

Example 7.52 Suppose the coefficient X in Example 7.49 is a real-valued random 
variable A > 0, so that Xit) satisfies a stochastic equation with random coefficients 
driven by a non-Gaussian process Z(t) — Y ( t ) 2 . Let Z(t) be a SROM for Z(t) with 
samples (z\, . . .,z m ) of probabilities ip \ ... .. p m ) and let A be a SROM for A with 
samples (X \, . . ., X^) of probabilities iq\ , .... (/,„). where m may or may not be equal 
to m. Numerical results are for Xit) in stationary regime, r] = 1, A ~ 17(1, 2), m = 
10, and m = 5, so that the SROM X it) for Xit) has 50 samples of probabilities 
Pkqi , k = 1, . . ., m, / = \ ..... in. As previously, temporal averages of properties 
of X(t ) are compared with corresponding properties of Xit). 

The absolute value of the errors of the first six moments of X it) relative to Monte 
Carlo estimates of these moments based on 1000 samples are 3.43, 1.08, 5.74, 12.10, 
21.02, and 31.97 in percentages. The solid and dotted lines in Fig. 7.43 (left panel) 
are the scaled covariance functions of X it) and Xit) for time lags in the range [0,1]. 
Fig. 7.43 (right panel) shows Monte Carlo estimates of the scaled covariance function 
Cx{t)/c x ( 0) of Xit) based on 5 samples of A and 10 samples of Zit). The estimates 
exhibit notable sample to sample variation, and can have large errors. The accuracy 
of X(t ) can be improved by increasing its size. For example, the absolute errors of 
the first six moments of X (t) corresponding to Zit) and A with m = 14 and m = 5 
do not exceed 8%. 

As in the previous examples, properties of X (?) can be obtained by elementary 
calculations. For example, the moment of order /• > 1 at a time t and the correlation 
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t t 

Fig. 7.40 Variance functions of X(t), m = 5, and X(t) in solid and dotted lines ( left pane!) and 
Monte Carlo estimates of the variance function of X(t) for sets of five samples of X(t ) (right panel) 


function of X (t ) at times s and t are 


E[X(t) r ] = X! ( X! (*wW) r Pk )qi 
1=1 '*=1 ' 
m , m v 

E[X(j)X(f)] = z z (x k j(s)x k ,iit)) r pkjqi, (7.122) 

1=1 ' k=l ' 

where X k j(t ) is the solution of (7.118) with A and Z(t) set equal to ki and Zk(t). 
respectively. <> 

Proof The discrepancy between the solutions X k j (f ) and Xj j (t ) defined by (7. 118) 
with (A, Z(t)) replaced with (7./, Zk(t)) and (kj , Zi{t )) is 

I X k j(t) - Xij(t) | < e~ x E J e XjS 8ij' k ,i(s)ds, 

where = |(X y - — 1/ ) x k j(s) + ( Zi(s ) — 5t(j))|. These inequalities can be 

used to bound moments of \X(t) — X (t)\. For example. 


i '» / 1 '» pt \ 

£[|X(f)-X(f)|] < ^Z Z (-Z Z e ~ kj> / e^ s 8i JX i(s)ds). 

fl 1=1 IjeJfi k= 1 z;g % Jo ' 


(7.123) 


where ^ , k = \ in, and ..if/, / = 1 denote partitions of the set of 
independent samples {zi (f)} and {A/} used to represent Z(f) and A. A 
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Fig. 7.41 Central moments of order 4 of X (t ) , in = 5 , and X( t) in solid and dotted lines (left panel) 
and Monte Carlo estimates of the central moment of order 4 of X(t ) for sets of five samples of X(t) 
( right panel) 


Fig. 7.42 Bound on 

Z?[|/2(f) — /r(f)|] form = 10 
and ?;=500 



7.5.4 Degrading Systems 

Suppose a through rectilinear crack of length 2«o is detected in the wing of an aircraft. 
Our objective is to find the probability that the crack length does not exceed a critical 
value a cr > 0 during the time r between scheduled inspections. It is assumed that 
the action on the wing can be modeled by a stationary broad band Gaussian process, 
the crack does not affect the overall wing behavior, the wing is a linear system whose 
dynamics can be captured by its first mode of vibration, and the stress process X(t) nor- 
mal to the crack controls its growth. Under these assumptions, X{t ) is the solution of 
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time, lag t t 


Fig. 7.43 Scaled covariance functions of X(t ), m = 10 and m = 5, and X(t) in solid and dotted 
lines {left panel) and Monte Carlo estimates of c x {t)/c x (0) for sets of five independent samples of 
A and ten independent samples of X(t) ( right panel) 


X{t) + 2fv 0 X{t) + vlX{t) = sW(t), t > 0, (7.124) 


where W(t ) is a Gaussian white noise with mean 0 and constant spectral density of 
magnitude sq, e > 0 is a small parameter, vo > 0 denotes the frequency of the 
first mode, and £ = s 2 e (0, 1) is the damping ratio in this mode. For a sufficiently 
large time, X{t) is a stationary narrow band Gaussian process with mean 0, variance 
er s 2 = 7Tso/(2vq), and spectral density s(v) = so/[(v 2 - Vq) 2 + (2fvvo) 2 ], that 
admits the representation 

X(t) = H (f)cos(vof + 'P(t)), (7.125) 


that is, X(t) is an harmonic with frequency vq and random phase </ / (f) that is modulated 
by a random amplitude H(t) ([62], Sect. 14.4, [20], Examples 5.5 and 5.12). The 
oscillations of the stress process X(t) are at a time scale much shorter than those of 
H(t) and (t ). It can be shown that the amplitude // (t) and the phase i//(f) of X(t) 
are diffusion processes defined by the stochastic differential equations 


dH(f) = -s 2 (v 0 H(t) ^-\dt + e^^-dBi(t) 

\ 2 v 2 //(/)/ v 0 

JWsFi 

d*(t) = e^-^dB 2 (t) 
vo H(t) 


(7.126) 


driven by the independent Brownian motions B\ ( t ) and B 2 (t) [52]. The first equation 
takes the form 


dR(t) = -e 2 ^ R{t ) - dt + s Vvo dB i(t), 


(7.127) 
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Fig. 7.44 Samples of X( t ) 
and H(t) for 
e = 0.3, so = 1> vo = it 
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by the change of variables R(t ) = //(f)/(V 2<r st ). Figure7.44 shows a samples of 
H(t) generated from (7.126) and the sample of X(t) calculated from (7.125) using 
samples of H(t ) and Wit). 

The Paris-Erdogan model predicts that a crack of length a extends by an amount 


a[i]( A)y/n A(2H)Y during a cycle of the far stress field with range 2 H, where a, /J 


are material constants and rj is a function of specimen geometry ([20], Sect. 7.5.2). 
Since the duration of stress cycles is 2 tt / t’o . a crack with length A(t) increases at a 
rate 



(7.128) 


so that ( A(t ), R(t)) is an Revalued diffusion process satisfying the Ito stochastic 
differential equations 



(7.129) 


Example 7.53 Suppose that the initial crack length A(0) — ao < a cr is deterministic 
and known and that our objective is to calculate system reliability P s (t), that is, the 
probability that the crack length does not exceed a critical value a a > 0 during a 
time interval [0, f], t > 0. 

An alternative form of (7.128) is dx(A(t )) = R(t)^dt, where 


cda 


r)(a)PaP!^ 


dx(a) 


and c = 


(V^Tst)^' 
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Fig. 7.45 Failure 
probabilities Pf(t) for 
a C r = 1.7 and 2.0 in 


This gives x(A(f)) - x(A(0)) = J ( , R(s/ds and A(f) = / J (x(A(0)) + fl(r)), 
so that the failure probability Pf(t) = 1 — P s (?) can be calculated from 

Pf(t) = P(A(t) > a CI ) = P{R(t) > x(t7 cr ) - x(ao)). 

where R( r) = R(s)^ds. Additional information on stochastic fatigue crack 
growth can be found in [20] (Sect. 7.5.2) and [63] (Sect. 9.2) . 

Figure7.45 shows estimates of failure probabilities Pf(t) = 1 — P s ( t ) fora = 
6.6 x 10 -7 , p — 2.25, io = 100, vq = 7t rad/s, ao = 1.5 in, and two critical 
crack lengths, a CI =1.7 and 2.0 in. The estimates of P f it ) have been obtained from 
100 independent samples of A(f). The left and the right panels of Fig. 7.46 show ten 
samples of R( r) and Ait). O 

The model in Example 7.53 assumes that system dynamics is not affected by 
damage, that is, crack length. A possible consequence of system degradation is that 
its modal frequency vq in (7.124) becomes a function of crack length A(f). As a result, 
X(t) would become a nonstationary, non-Gaussian process. A heuristic solution to 
this problem can be developed by using the observation that the temporal scales of 
A(r) and X(t) differ significantly. For example, we may assume that Ait) is constant 
over time intervals covering a large number of cycles of X(t) and that these intervals 
are much larger that the duration of transients of X(t) following a change in model 
frequency. Under these assumptions, the reliability analysis in Example 7.53 can be 
employed in the intervals of constant values of A(f). 

Example 7.54 Suppose vo in (7.124) decreases suddenly to v* > 0 at a time 
t* e (0, r) and that Xit) is stationary in it* , r). The crack growth rates in the 
time intervals [0, ?*) and [t* , r) are A(f) = (1 /c)j){A(t))P A(t)P/ 2 R(t)P and 
A(t) = (l/c*)r](A(t))P A(t)P/ 2 R*(t)P , where c* — 27r/[v*a(2V2jr<7 s *)^], R*(t ) 
is given by (7.127) with v* in place of vq, and cr s * = ;r5o/[2(v*) 3 ]. The left panel in 
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Fig. 7.46 Samples of R(t) {left panel) andA(f) (right panel) 



t t 


Fig. 7.47 Estimates of Pf(t) for vo (left pane!) and v* = 0.8vq (right panel) 


Fig. 7.47 shows the estimates of Pf(t) in Fig. 7.45. The plots in the right panel are 
estimates of Pf(t) for v* = 0.8vo obtained from 100 independent samples of A ( f) 
and R(t). The plots in the two panels of the figure show that modal frequency can 
affect significantly our estimate of Pf( r). <> 


7.6 Exercises 

Exercise 7.1 Let X(t) be a real-valued process satisfying the differential equation 
X(t) + 2t;voX(t ) + v^X(t) — W(t), t > 0, with initial conditions XIO) = 0 and 
X(0) = 0. where ( e (0, 1), vo > 0, and W(t) denotes a real-valued white noise with 
mean 0 and one-sided spectral density g(v) = go > 0. v > 0. Find the expression 
of the covariance matrix y x % (t ) of the bivariate vector (X ( t ) , X (t) \ as a function of 
time. Show that the entries (1,1), (2,2), and (1,2) of the stationary covariance matrix 
of ( X(t ), X(t)) are 7rgo/(4£vQ), 7rgo/(4Cro), and zero, respectively. 
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Exercise 7.2 Complete the proof of Theorem 7.2 by showing that the correlation 
function of X(t ) satisfies the third equation in (7.6). 

Hint Multiply the equality X p (t) — X p (s) — f * dX p (u), t > s, by X q ( 5 ), calculate 
the expectation of the resulting formula, and take its derivative with respect to time. 

Exercise 7.3 Extend the proof of Theorem 7 .2 to the case in which B(t) has dependent 
coordinates. For example, set B(t) = (Ht) B*(t), where /1(f) is a (d',d*) matrix 
with real- valued entries and B*(t) denotes an R^* -valued Brownian motion with 
independent coordinates. 

Exercise 7.4 Extend Theorem 7.2 to the case S(t) = C{t) = 'Y^k=\ If-- where Nit) 
is a homogeneous Poisson process with intensity X > 0 and { Y t} are iid W 1 -valued 
random variables. 

Hint Use Ito’s formula in (5.16). 

Exercise 7.5 Calculate the second moment properties of Z(t) = Y (t) k , where Y(t) 
is the process in Example 7.3. 

Exercise 7.6 Derive the differential equation for the covariance function of the state 
of a linear system driven by the colored noise given by (7.14). 

Exercise 7.7 Calculate and plot the correlation function of X{t ) in Example 7.4 for 
k = 3. 

Exercise 7.8 Repeat the calculations in Exercise 7.7 with B(t) replaced by a com- 
pound Poisson process C(t) with the same second moment properties as Bit). 

Exercise 7.9 Derive the partial differential equations for the characteristic and den- 
sity functions for the diffusion processes in Example 7.11. 

Hint The differential equations for the characteristic and distribution functions of 
X(t) driven by Gaussian white noise can be obtained directly from (7.17) and (7.18). 
For X{t) driven by Poisson white noise, apply Ito’s formula. Use the independence 
between X(s—) and AC(s) = C(s) — C (s — ) and the fact that Cis + As) — C(s) 
for small As > 0 is 0 or IR for some k with probabilities /.As or I — /.Av. 

Exercise 7.10 Calculate the mean and covariance of X n conditional on a sample 
(7o = 0, 7) (&))> • ■ ■) of the jump times (Tq, 7) . . . .) of semi-Markov sequence in 
Example 7.17. 

Exercise 7.11 Derive the expression of Aj, + | in (7.39). 

Exercise 7.12 Prove the last equality in (7 .44) and find the second moment properties 
°f X lh o + eX n 1 . 

Exercise 7.13 Show that W given by (7.46) is a linear space, (7.47) defines an inner 
product on W , and W with the norm induced by this inner product is a Hilbert space. 

Exercise 7.14 Find the moment and the Fokker-Planck equations for the process in 
(7.76), that are given in Example 7.29. 
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Exercise 7.15 Develop bounds as in Sect. 7. 5. 3. 2 on the error of SROM-based solu- 
tions for simple linear oscillators and multi-degree of freedom linear systems with 
random stiffness and damping that are subjected to Gaussian white noise. 

Exercise 7.16 Calculate the correlation function of X(t) in Example 7.36 and 
establish conditions under which it converges to the correlation function of X(t) 
as m — »■ oo. 
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Chapter 8 

Stochastic Algebraic Equations 


8.1 Introduction 

Let A and B be random matrices defined on the probability spaces (f2j , . C A \ , P \ ) and 
&2, Pi), respectively that may or may not be distinct. Let U be an -valued 
random variable define by 


AU — B, (8.1) 

referred to as a stochastic algebraic equation (SAE). The solution U of this equation 
is defined on the product probability space (f2 1 x Qy, x .^ 2 , P\ x ) . If A 
and B are defined on the same probability space (Q . .IP, P), so is U. SAEs may 
result from equilibrium conditions for physical systems with uncertain properties 
described by models with a finite number of degrees of freedom. They also result 
from time-invariant stochastic partial differential equations by discretizing both the 
physical space and the probability space. The physical space can be discretized by 
solving finite difference or finite element representations of these equations that have 
a finite number of degrees of freedom. The probability space can be discretized by 
replacing the random fields in the definition of stochastic differential equations by 
parametric models, that is, deterministic functions of spatial coordinates that depend 
on a finite number of random variables. 

Monte Carlo simulation is the only general method for estimating the probability 
law of U in (8.1), but can be impractical in realistic applications since it involves 
repeated solutions of (8.1) for samples of A and B. This limitation has promoted the 
development of a broad range of approximate solutions for (8.1), that we divide into 
two classes depending on the degree of uncertainty in the random entries of A and B. 
The Monte Carlo simulation, stochastic reduced order models, stochastic Galerkin, 
stochastic collocation, and reliability methods are used in Sect. 8.2 to solve SAEs 
with random parameters of arbitrary uncertainty. The Taylor, perturbation, Neumann 
series, and equivalent linearization methods in Sect. 8.3 are for SAEs with random 
entries of relatively low uncertainty. 


M. Grigoriu, Stochastic Systems, Springer Series in Reliability Engineering, 
DOI: 10. 1007/978- 1-447 1-2327-9_8, © Springer- Verlag London Limited 2012 


337 


338 


8 Stochastic Algebraic Equations 


The implementation of most methods for solving (8.1) requires that A -1 exists 
a.s. Unfortunately, there is no simple criterion establishing whether a random matrix 
can be inverted with probability 1. Available results are limited to special classes 
of random matrices, for example, deterministic matrices with entries polluted by 
independent Gaussian variables [12] and other types of matrices [20], that are not 
sufficiently general to be useful in applications. For solutions of SAEs that require 
the a.s. existence of A -1 , we may need to construct histograms of the determinant of 
A or of the conditional number 1 1 A 1 1 1 1 A " 1 1 1 of A to assess in an approximate manner 
whether or not A can be inverted, unless the structure of A suggests alternative 
techniques for assessing the existence of its inverse. 

Example 8.1. Consider (8.1) with cl — 3 and 


A = 


X\ + x 2 -x 2 o 
-x 2 X 2 + X 3 -x 3 
0 -x 3 X 3 


where [A,] are uniformly distributed in (a,-, £>/), i = 1,2, 3,withai = 2.5, b\ = 3.5, 
a 2 — 2, b 2 = 3, = 1.5, and b 3 = 2.5. The support of a histogram for the 

determinant of A obtained from 1000 independent samples of A is [8.05, 25.42], and 
changes slightly if the sample size is increased. This suggests that A is invertible a.s., 
so that it is expected that (8.1) has a unique solution with probability 1. <> 

The special case of (8.1) with A deterministic is not discussed since, if A -1 
exists, then U — A~ l B is a linear mapping of random vector B, so that the mean 
and correlation matrices of the solution are E[U] = A~ l E[B] and E[UU'] = 
A -1 E[B B'](A~ 1 )' , respectively. Higher order statistics of U can be obtained effi- 
ciently by, for example, Monte Carlo simulation since the mapping B h-> U = A - 1 B 
is known. If B is Gaussian, its first two moments suffice to find the probability law 
off/. 

Also, interval solutions for (8.1) are not considered since probability statements 
based on these solutions are rather limited. For example, suppose the vectors V a 
and Vb contain the random entries of A and B and that they take values in some 
bounded subsets D a and I)/, . Interval analysis constructs a set I that contains the 
solution U of (8.1) for V a e D a and V/, e Db- If V a and V/, are independent, then 
P(U e /) > P(A e D a ) P{B e D b ) [7], 


8.2 SAEs with Arbitrary Uncertainty 


Monte Carlo simulation, stochastic reduced order models, stochastic Galerkin, sto- 
chastic collocation, and reliability methods are used to solve (8.1) approximately. 
Conditional analysis and state augmentation applied in the previous chapter to solve 
ordinary differential equations with random coefficients are of limited value when 
dealing with SAEs, and are not discussed. 
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The Monte Carlo, stochastic reduced order models, and stochastic collocation 
methods are non-intrusive, that is, existing deterministic software can be used to 
construct estimates and/or approximations for the solution of (8.1). On the other 
hand the stochastic Galerkin method does not have this feature, and is said to be 
intrusive. A non-intrusive Galerkin method has been proposed in [18]. 


8.2.1 General Considerations 

Strong and weak solutions can be constructed for (8.1). Strong solutions exists if 
A -1 is defined a.s., and their existence is required by, for example, the Monte Carlo 
method. Weak solutions are particularly needed for the stochastic Galerkin method. 
We define weak solutions and assess their accuracy by bounds. Two types of bounds 
are presented. The first type is given by Theorem 8. 1 and relates to weak solutions for 
(8.1). The second type are bounds on the discrepancy between solutions of distinct 
deterministic algebraic equations (Theorems 8.2, 8.3). 

Let A be an ( d , (/(-random matrix defined on a probability space (32 , & , P) and 
S3 : L 2 (32, P) x L 2 (32, P) ->■ C a functional defined by 


38(U, V) = E[(AU)'V*], U, V e L 2 (Q , P). 


( 8 . 2 ) 


It is assumed that A is Hermitian and positive definite a.s. and that there exists con- 
stants c > 0 and a > 0 such that \38(U, V)| = |£[(A U )' V*]| < c ||f/|| || V|| and 
38 (U, U) — E[(AU)'U*] > or || C/ 1| 2 , where || • || denotes the norm in L 2 (32, P ). 
We say that 38 with these properties is bounded and positive definite (Sect. B. 4. 2). 

Note that (z*)'Az = z’ A*z* = z'A'z* = (. Az)'z * e M for z e C d since A 
is Hermitian, that is, A,y = A*. , i, j = 1 , ... ,d, and that (z*) f Az > 0 a.s. for 
z e C d \ {0} since A is positive definite. Since A is Hermitian and positive definite, 
it admits the Cholesky decomposition A = L(L*)', where L is a lower triangular 
matrix. 

Definition 8.1 The weak solution of (8.1) is an -valued random variable U e 
L 2 (32 , & , P) satisfying the equation 


38 (U, V) = (B, V), WV e L 2 (32,^, P), 


(8.3) 


where (-, ■) denotes the inner product in R rf . 

The weak form (8.3) of (8.1) results by multiplying AU = B with an arbitrary 
vector V e L 2 (£2 , JF, P) and taking the expectation of the resulting equation. If 
B e L 2 (Q , 3P , P), the functional (B, V) of V will be linear and bounded. The 
bilinear functional 38 in (8.2) is bounded and elliptic under the stated assumptions. 
Various conditions can be imposed on matrix A such that 38 is bounded and elliptic. 
A condition on A assuring that 38 is bounded is given by the following example. 
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Example 8.2 If there exists a constant f > 0 such that Xti (Z;=i l A yl) 2 < £ 2 
a.s., then the bilinear form 8$ is bounded, that is, \8§(U, V)| < f 1| U || || V || for all 
U, V e L 2 (C2 P). O 

Proof We have |Zs[(AC/)'V*] |< (E[(AU)' (AU)*]) ^”|| V|| by the Cauchy-Schwarz 
inequality, and 


E[(AU)'(AU)*] = E 


< E 


d d 2 

X Z a >jUj 

1 = 1 7=1 

< E 

d , d v 2 

Z(Zi^n^i) 

1 = 1 ' 7=1 ' 

d / d \ 2 

Z Zi^hi^M 
1=1 \'= 1 ' 

= E 

i(ii^i) 2 i 

i=l \/'=l 7 


</l 2 £[||f/||^]=/l 2 ||C/|| 2 . 




where || C/ 11^^ = Za=i \Uk\ 2 is the square of the Euclidean norm in These 
observations give 1 8S(U, V)| = |Zs[(A{/) , E*]| < /J||[/|||| V||. A 

Let X denote an -valued random variable defined on a probability space 
{82 , & , P) that collects all distinct random entries in matrices A and B. Let X be 
another d x -dimensional random vector defined on the same probability space as X. 
Denote by U the solution of the SAE AU = B, where (A, B) are (A, B) with X in 
placeofX. If X{82) C. r = X(Q). then A has the same properties asA, the functional 
88 (U , V ) = E[(AUyV*] is well defined and has the same properties as 8%. 

We consider both strong and weak solutions for the SAEs AU = B and 
AU = B. The weak solutions of these equations satisfy 88{U, V ) = E[B'V*] and 
8g(U, V ) = E[B'V* ] for all V e L 2 {82 , , P). The relationship between weak 
and strong solutions follows from that between weak and strong convergence for 
random sequences in Hilbert spaces (Sect. B.4.3). The existence and uniqueness of 
weak solutions is guaranteed by the Lax-Milgram theorem (Theorem B .44) and The- 
orem 8.8 provided E[B'V *] and E[B'V*] are bounded for all V e Lr{82, 8P , P). 

Theorem 8.1 IfU and U are weak solutions of AU = B and AU = B , matrices 
A and A have the properties in Definition 8.1, B,B e l 2 (Q , ■"¥ , P), and 
(X//=i I A,y — A/y l) 2 < p(A, A) 2 a.s. for a constant p.(A , A) > 0, then 


U-U || < 



B\\+^A,A)\\U\\), 


(8.4) 


where a > 0 is a constant such that 8§ {V, V ) > ar|| V\\ 2 for all V e L 2 (82 , & , P). 
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Proof We have 

| SS{U - U, V)\ < I 3§{U, V) - 3S(U, V)\ + I J(f/, V ) - 38(U, V)\ 

= \E[(B-BYV*]\ + \E[((A-A)U)'V*]\ < (||fi-fi|| + ||(A-A)t/||)||V|| 
< (||fi-fi||+/i(A,A)||£7||)||V|| 

by the Cauchy-Schwarz inequality, the assumption on the discrepancy between 
A and A, and the inequality 

||(I - A)U\\ 2 = E[((A - A)U)\(A - A)tf)*] < /x(A, A) 2 \\U\\ 2 , 

that follows by arguments similar to those used in Example 8.2. The inequalities 
| @(JJ - U, V)\ < (|| B - fi|| + n(A, A)||£/||)||V|| and a||V|| 2 < |^(V, V)| for 
V = U — U give 

a|| U - U || 2 < \BS{U - U, U - U)\ < (||fi - fi|| + /x(A, A)|| t/||) || U - 0 1|, 


which yields (8.4) by division with \\U — U\\. As expected, the bound on the dis- 
crepancy between U and U depends on differences between (A, fi) and (A, fi), and 
vanishes as (A, fi) approaches (A, fi). A 

Theorem 8.2 Let m® and 5® denote solutions of (8. 1) for X set equal to samples 
x ® and ic® ofX and X, respectively. Denote by (A®, fi®) and (A®, fi®) the 
matrices (A, B) for X = x ® and X = x®, respectively. If\\B^\\ ^ 0, then 


|| Am ® 0 || 

w 


< II (A®)- 1 II II A® II 


/||Afi®°| 

V i|fi (0 n 


||AA®°||\ 

iiAtoii y 


(8.5) 


where AA®° = A® - A®, Afi®° = fi® - fi®, and Am® 0 = 5® - m® 
and the norms are Euclidean norms for vectors and corresponding induced norms 
for matrices. 

Proof Set (A®) -1 = (A®) -1 + A -1 . Since Am® 0 = (A®) -1 fi® — (A®) -1 fi®, 
we have 

Am® 0 = ((A®) -1 + A -1 ) (fi (0 + Afi®°) - (A®) -1 fi <0 

= (A®)" 4 fi (0 + A -1 fi (0 + (A®) -1 Afi®° - (A®) -1 fi (0 
= (A®) -1 (Afi®° - AA®° m (0 ). 


The expression of Am® 0 results from / = A®(A®) _1 = I + AA®°(A®) _1 + 
A ( *) A - 1 or A® A -1 = — AA®°(A®) -1 , which gives A -1 = — (A®) _1 AA®° 
(A®) -1 and A _1 fi® = — (A®) -1 AA®°m®. The above inequality implies 

|| Am® 0 || < || (A®) -1 1| (|| Afi®° || + || AA 


®°llll„® 
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which together with 


AB {kJ) 


AB&M || 


|| AB {kJ 


■||A (i) ||||« (i) ||, 


yields (8.5). ▲ 

Theorem 8.3 With the notation in Theorem 8.2, let Amax denote the largest real 
part of the eigenvalues of (A^ + (A^)')/2. If < 0 and the real part of the 
eigenvalues o/A® are negative, then 

\\Au {kJ) \\ < -^-|| - AA kkJ) U {k) + AB kkJ) \\ 

y\ l ) 

A max 

< -^-^||2iA ( *’'' ) ||||(i (fe) )“ 1 ||||fi w || + ||A5 (W) ||Y (8.6) 

^-max ' ' 


Proof Note that the condition hmax < 0 is stronger than the requirement that the 
eigenvalues of A^ l> have strictly negative real parts [25] and that (8.6) does not hold 
for Amax >0. 

The discrepancy || Y ® ( t ) — Y (, f t ) || between the solutions of the linear differential 
equations Y (i) (t) = A (i) Y (i) (t ) - 5 (,) and Y (k) (t) = A (k) Y (k \t) - B (k) can be 
bounded by ([13], Theorem 10.6, and Sect. 7.5.3. 1 in this book) 


Y (k \t ) - F (,) (f)|| < e x ™' 


e -^axS II (A°) - A (k) )Y (k \s) + ( B (k) - B {i) )\\ds. 


(8.7) 

Since the above differential equations admit steady-state solutions by assumption, 
lim,^ oo F ( °(0 = (A (,) ) _1 and lim^ooF^f) = (A (k) )~ l B (k f that is, the 
solutions of the algebraic equations A^u^ = 5 (,) and A <k ^u <k ^ = B <k \ respec- 


tively. The bound in (8.6) results from (8.7) in the limit as t -* oo. A 


8.2.2 Monte Carlo Method 

Estimates of moments and other properties of the solution U of (8.1) can be 
constructed from samples of this random vector calculated from samples of A 
and B , that are derived from samples of X. The methods in Sects. 2. 13.1 to 2.13.3 can 
be used to generate samples of X and construct estimators for properties of U. It is 
assumed throughout this section that A -1 exists with probability 1. 

Example 8.3 Consider the one-dimensional problem AU = 1 with A uniformly 
distributed in (a i, aj), 0 < a\ < aj < oo. The unique solution of this stochastic 
algebraic equation is U = 1/A. Figure 8.1 shows with solid and dotted lines the 
coefficients of variation of U and A as a function of a \ for aj = I ■ Since the 
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Fig. 8.1 Coefficients of 
variation of A ( dotted line) 
and U ( solid line ) for an = 1 



a l 


uncertainty in U is much larger than that in A for small values of a\ , estimates of 
U need to be based on larger samples than those of A. However, it is not possible 
to determine the sample size needed to estimate moments and other properties of U 
to a specified accuracy since, in contrast to A, the probability law of U is unknown 
prior to calculations. O 

Proof The mean and variance of A are (a\ + af)/'! and {ai — «i) 2 /12 so that 
its coefficient of variation is {ai — a\)/{(a\ + aflyf?). The moments E[U q ] = 
(a\ q — a\ 9 )/(l — q)/(ci 2 — fli) for q > l and E[U q ] = In (« 2 /« 1 ) / (^2 ~ fl l) for 
q — 1 can be used to calculate the coefficient of variation of U. ▲ 

Example 8.4 Let U q = Xit=i ^ J 'k / n an estimator for the moment E[U q ] of 
order q for U in Example 8.3, where { Lf } are independent copies of U. The mean 
and variance of U q are E[U q ] and ’Vai[U q ]/n so that U q is an unbiased estimator 
of E[U q ] whose accuracy improves with the sample size n. O 

Proof We have E[U q ] = X*=t E[U q ]/n = E[U q ] so that U q is an unbiased 
estimator for E[U q ], Since Uk are iid random variables, we have E[(!J q ) 2 ] = 
[ZLt E[Uk q ] + T'u=i.k^E[U q k U q j\/n 2 = [nE[U 2q ] + (n 2 -n)E[U q ]-]/n 2 , 
which gives the stated result since E[U q ] = E[U q ]. A 

Considerations in Examples 8.3 and 8.4 extend directly to the matrix equation 
given by (8.1). Let {U- k> \ be independent copies of Ui = XsLi B s , where A~ l 
denotes entry ,v) of A 1 . The mean and variance of the estimator 




( 8 . 8 ) 
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Fig. 8.2 Coefficients of 
variation of U 2q for 
q = 1 10 and ci 2 = 1 



of E[U '?] are E[U < f'\ and Var [Uf]/n, that is, U f is unbiased estimator and its vari- 
ance approaches 0 as h — > oo. Similar unbiased estimators can be develop for the 
distributions of coordinates and functions of U. For example, 

1 " 

A(?) = -X 1 (*( £/(k) > :5 0» ( 8 - 9 ) 

n k= 1 

is an estimator for the distribution F/, of the real-valued random variable h(U), 
where h : -> R. is a measurable function and {t/®} are independent copies 

off/. 

The computational effort for constructing estimators of the types in (8.8) and 
(8.9) has two distinct components, the time for calculating a single sample of the 
solution U of (8.1) and the sample size n needed to estimate accurately properties 
of functions of U. Relatively large computation times needed to obtained a single 
sample of U can be an insufficient argument for labeling the Monte Carlo inefficient. 
The label may be inadequate if, for example, estimators with satisfactory accuracy 
can be obtained from a relatively small number of samples, in which case Monte 
Carlo can be competitive. For example, Fig. 8.2 shows coefficients of variation for 
U 2q , q = 1, ..., 10, as a function of a\, where U is the solution of the stochastic 
algebraic equation AU — 1 in Example 8.3 for a 2=1. The ordinates of the graphs 
increase with q for each a\ . The plots suggest that large samples of U are needed 
to obtain satisfactory estimates for E[U 2q \ if a i is close to 0. On the other hand, 
E[U 2q ] can be estimated accurately from a relatively small number samples of U if 
ci\ is away from 0 and, for example, q < 2, that is, moments of U up to order 4. 


8.2.3 Stochastic Reduced Order Model Method 


Let X be an W 1:: -valued random variable including all random parameters in the 
definition of the stochastic algebraic equation AU = B given by (8.1). The idea is to 
approximate U by the solution U of a version of (8.1) obtained by replacing X with 
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a stochastic reduced order model (SROM) A of it, that is, a simple random vector 
taking m > 1 distinct values (x (1 \ . . . , it <m 1 ) in the range /’ = X (£2) of X with 
probabilities (p i, . . . , p m ). The defining parameters (x lk) . pi- ) , k = 1, . . . , in, of X 
are such that the probability laws of X and X are similar in some sense (Sect. A. 3). 
Bounds on statistics of || U — U || are developed and used to measure the performance 
of U . It is assumed as in the previous section that A is invertible a.s. 

Two SROM-based methods for solving SAEs are discussed. The first method 
approximates the solution U of a SAE by a SROM U with samples {«/,} obtained by 
solving deterministic versions of (8.1) with samples {x^} of a SROM X of Ain place 
of X. The samples {w*} have the same probabilities { pk } as the samples of X. The 
second method approximates the mapping K U by a piecewise linear function 
given by hyperplanes tangent to it at {«&}. The resulting representation for X i->- U 
is more accurate than that in the first method, but its implementation is less simple. In 
addition to {m*}, we need to find the gradients of U with respect to the coordinates of 
X at {i/j:}, construct a Voronoi tessellation with centers \xk) in the range r = X (£2) 
of X, and estimate statistics of U from a piecewise linear representation of mapping 
X i — C/ by Monte Carlo simulation. We refer to the first and second methods as 
SROM- and extended stochastic reduced order model (ESROM)-based solutions 
(Sects. A. 3 and A. 4). 


8.2.3.1 Stochastic Reduced Order Models (SROMs) 

The construction of SROM-based solutions for (8.1) involves three steps. First, a 
SROM A of A needs to be constructed. Optimization algorithms with objective 
functions measuring the discrepancy between properties of A and A under the con- 
straints pk > 0, k = 1, . . . , m, and X™=i Pk = 1 can be used to find (x®, pk), 
k = 1 , ,m, (Sect. A. 3). The defining parameters of A depend solely on the 

probability law of A and their determination does not involve solutions of (8.1). 

Second, m solutions u ik) of deterministic versions of (8.1) with A set equal to 
x (k \ k = I , . . . , m . need to be calculated. These solutions and the probabilities of A 
define a SROM U for U. The defining parameters of U are (u (k} , pk), k = 1, . . . , m. 

Third, U is approximated by U. Properties of U can be obtained by elementary 
calculations. For example, the distribution P(U j < z) and the moment of order 
r > 1 of coordinate Uj, j = l,...,d,ofU are P(0 j < z) = Xa-=i < z)pu 
and E[Uj ] — Xfc=i )' Pk , respectively. 

We have seen that the construction of a SROM U for U involves solutions of 
deterministic versions of (8.1) with A set equal to points in r — X (Q ). The imple- 
mentation of the Monte Carlo simulation and stochastic collocation methods also 
involve solutions of deterministic versions of (8.1). The points in r used by both 
Monte Carlo and SROM-based solutions depend on the probability law of A. On the 
other hand, stochastic collocation solutions use points in r that are unrelated to the 
probability law of A (Sect. 8.2.5). 
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Example 8.5 Consider the stochastic algebraic equation AU = 1 with A — X ~ 
U (a, b). Let x® < • • • < x^® be m equally spaced points in (a, b) with x® — 
a = b — x <m) = e > 0. To complete the definition of X . we need to specify the 
probabilities (pi, , p m ) of the samples of X . The accuracy of SROM solutions 
of (8.1) depends on the defining parameters (x®, pk). k = 1, . . . , m, of X. For 
example, the errors in the mean and the standard deviation of U are —42.27% and 
—81.78% for (a, b) = (0.01, 3), e = 0.2 and m = 4. They decrease to 2.76% and 
— 15.89%, respectively, for e = 0.03 and m = 20. The errors of the mean and the 
standard deviation of U are 0.0385% and —1.2980% for ( a , b) = (1, 3), e = 0.2, 
and m = 4. <> 

Proof The probabilities (pi, , p m ) of the range (x® , . . . , x (m> ) of X have been 
selected to minimize the objective function 


e — a\ f (F(x)~ F(x)) 2 dx + a 2 ^{E[X r ]- E[X r ]) 2 , (8.10) 

^ r=l 

under the constraints pk > 0, k = 1 , ... ,m, and Xr-Lt Pk = 1' where a\, a 2 >0 
are constants, F(x) = P(X < x) — (x — a)/(b — a), x e (a, b), F(x) = P(X < 
x) = Zfc=t PkHx {k) < x), E[X r ] = (b r+1 -a r+1 )/(b-a)/(r + 1), and E[X'] = 
'/H'k=i(x k ^ r Pk- The distribution and the moments of the SROM U of U are P(U < 
u) = Xi”=t P*1 (m < u (k) ) and E[U r ~\ = Xl”=t Pk, where n® is the solution 
of AU = 1 for X = x®. Numerical results in this example are for a | = a 2 = 1. ▲ 

The stochastic algebraic equation in Example 8.5 is also solved in Examples 8.15 
and 8.16 by the stochastic Galerkin and collocation methods. The accuracy of the 
approximate means and standard deviations by all these methods is remarkable for 
(a = 1,7' = 3) but varies from method to method for (a = 0.01, b = 3). 

The bounds in Theorems 8.2 and 8.3 developed for deterministic algebraic equa- 
tions corresponding to samples of X and X can be extended to stochastic algebraic 
equations. Let [x^ l \ i = 1, ...,«} be a collection of independent samples of X that 
is sufficiently large to provide an accurate characterization for the probability law 
of X. Let {%, k = 1, . . . , m }, m <•£ n, be a partition of (x (1 \ . . . , x^) such that 
p k = rik/n, where iik denotes the cardinality of %'k ■ Let C he a measurable function 
mapping the members of ^ into x®, that is, £(x *■*•') = x® for x <l> e c 6, \ . The 
mapping C, can be constructed by an algorithm in Sect. A. 3 that is also outlined here 
for convenience. 

Suppose X is described satisfactorily by n independent samples (x® , . . . , x®). 
This set of samples can be partition by the following two-step procedure. In the first 
step, we construct a partition i#) by assigning x } to ( X' k if it is closer to x® than any 
other x (/ \ / k, that is, x (i) is assigned to ^ if d(x^ l \ x®) <d(x^,x^),l f k, 
where d is a metric in r = X(F2). If d(x^ l \ x®) = c/(x® , x ^) for two distinct 
indices (k, l), then x ( '^ is assigned to either r #7 or Generally, the cardinalities n' k 
of the resulting clusters do not satisfy the condition pk = n' k /n. In the second 
step, we eliminate the members of the clusters ( ta! with n' k /n > pk that are the 
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farthest from x® till the reduced versions of these clusters satisfy the condition 
n'l/n — pf., where nf denotes the cardinality of %?/'. The members extracted from 
the clusters ( €! with n'./n > pk are assigned to the clusters with n^/n < p^ based 
on their closeness to the nuclei x® of these clusters and the requirement n'! / n — p^ . 
The algorithm delivers a partition { r ta\, k = 1, . . . , m) of (x® , . . . , x®) such that 
the members of % are mapped into x® and the probability that X takes values 
in % is pk ~ n/fin. 

With these considerations the bounds in (8.5) and (8.6) extend directly to stochas- 
tic algebraic equations. For example, the expectation of the discrepancy \\U — U\\ 
between the exact and the approximate solutions of (8.1) can be bounded by 


£■[111/ - c/||] < - V y -ttHI - AA (U) + AB (kJ) II 
k=l Amax 

<-y y -^-^ii4A ft ’ ! ' ) iiii(A w ) _i inifi®ii + iiab® ! ' ) h') 

” k= 1 x d) € % ^max V / 

(8.11) 

by (8.6). Similar bounds can be constructed for other statistics of the discrepancy 
between U and U . 

The bounds in Theorems 8.1, 8.2, and 8.3 on the discrepancy between properties 
of U and U show that the approximate solutions U converge to the exact solution 
U of (8.1) as the discrepancy between (A, B) and (A, B) vanishes. However, their 
calculation is impractical, as illustrated by the bound on E[\\U — U j|] in (8.1 1). 

Theorem 8.4 Let g : G —*■ R. be a differentiable function such that ||V g(f)|| < Af, 
where M > 0 is a constant. If the mapping X \— > U defined by (8.1) is Lipschitz, 
that is, solutions u ' , u" o/(8.1) with X equal to x ' , x" £ are such that || u' — 
u " || < c||x' — x"\\ for a constant c > 0, the discrepancy between the probabilities 
P(g(U) < z) and P(g(U) < z) can be bounded by 

1 m 

\P(g(U)<z)~ P(g(U)<z)\<-T y £®°(z), (8.12) 

n 

k= 1 x^e^k 


where ^ k, ’\z) = |l(g(n®) < z) — 1(#(m®) < z)| = |lcM||x® — x®||sign(z — 
g(u (k} )) < z - g(H (k) ) - Hg(u (k) ) < z) I- 

Proof As previously, let w® and w® be solutions of (8. 1 ) for X = x® , i = 1 
and X = x®, k = 1, . . . , m, respectively. For x ® there exists x® !) on the 

segment connecting w® to w® such that g(u^) = g(u (k) ) + J with J = Vg(x® ! ^)- 
( u — m®) by the mean value theorem. Suppose g(w®) < z so that l(g(w®) < 
z) = 1. If / < z — g(w®), then 1 (g(u^) < z) — 1 so that f ®'l(z) = 0. Otherwise, 
1 (g(w (,) ) < z) = 0 and £®‘l(z) — 1. We construct a lower bound on l(g(M®) < z) 
which will yield an upper bound on £®'^(z). Since | J I < l|Vg(x (t ’°)ll ||n (i) - 
5® || < c M ||x®— x® || implying J < |7| < cM||x® — x® ||, we have l(g(M®) < 
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z) > \{cM\\x ^ — x (k) \\ < z — g(u (k) )) so that ^ k,l \z) < \\{cM\\x^ — x®|| < 
z — g(u {k) )) — l(g(M ( ^) < z)|. Similar considerations for the case g(u ®) > z 
give l(g(M*^) < z) < \{cM\\x^ — x® > g(u — z) (Exercise 8.3). A 


Example 8.6 Under the conditions of Theorem 8.4 and the assumption that the 
random variables g(U) and g{U) have finite expectations, we have 

M m 

\E[g(U)]-E[g(U)]\ = \E[g(U)-g(U)]\< C —Y J X H* ( 0 -* W H’ ( 8 - 13 ) 

H k=l x (i) e tf k 

where the constants c, M > 0 are as in Theorem 8.4. O 

Proof Since g{u — g(u (k) ) — Vgix^’^) ■ (h^ — w®), we have | g(u^) — 
g(ii (k) )\ < II Vg(x ( *’°)ll I| m ( 0 ) — w®||, where x ( k ’ l> is on the segment with 
ends and m®. This bound on | g(u^) — g(w®)| and \E[g(U) — g(U)]| = 
(!/«) Z*= l Z,«) S% k(M (0 ) - #(w W )l give (8.13). A 

Example 8.7 Consider the stochastic algebraic equation (8.1) with d — 1, A = X a 
random variable uniformly distributed in (a, b), and B — 1. Three SROMs X with 
m = 10, 20, and 40 samples have been constructed for X. The range of these models 
is as in Example 8.5, that is, x^ = a + s. x (m) = b — s, and the rest of the samples 
of X are equally spaced in the range (x (I \ x < - m ' > ) with s = 0.1, 0.03, and 0.018 for 
m = 10, 20, and 40, respectively. The scaled version of the upper bound in (8.13), 
that is, (1/n) Jfk= l Z*m e % ||* (i) - x W ||, is equal to 0.0944, 0.0644, and 0.0585 
for n = 1000 and m = 10, 20, and 40, respectively. The cardinalities of the largest 
cluster % are 107, 54, and 26 for m = 10, 20, and 40. <> 

The bounds in (8.12) and (8.13) do not account for statistical uncertainty since 
their construction assumes that X is a simple -valued random variable with 
equally likely samples (x ■ . . . , x (ll> ). These bounds can be modified to account 

for the fact that the estimated properties of X depend on the particular samples 
(x (1 \ . . . , x^) considered in analysis and the sample size n. For example, let 
Ip(z) = (l/«) Z/'=i l(g(£/ (!) ) < z) and I v = (1/n) X"=i 8(U {,) ) be estimators 
of P(g(U) < z) and E[g(U)], respectively, where U l,) are independent copies of U. 
Since the mean and variance of, for example, the estimator Ijj = ( 1 / n) Z/Li g(U (l> ) 
are ^[^(C/)] and Var[g(t/)]/«, respectively, we conclude that Ijj is an unbiased esti- 
mator for E[g(U)\ with the property P(\Iu — £[g(t/)]| > e) < Var[g((/)]/(ne 2 ) 
that holds for e > 0 arbitrary by Chebyshev’s inequality. 

The contribution of statistical uncertainty to the previous bounds on the discrep- 
ancy between U and U can be incorporated simply. We have 

I P(g(U) < z)-P(g(U) < z)| < |P(g(£/) < z)-Ip(z ) I + I Ip(z)-P(g(U) < z) | 

I E[g(U)] - E[g(U)]\ < | E[g(U)] - I V \ + | i v - E[g(U )] |, (8.14) 

where the terms \Ip(z) — P(g(U) < z)| and | Ijj — £’[g(C/)]| are those in (8.12) 
and (8.13) while the terms | P(g(U) < z) — Ip(z ) | and |£[,g((/)] — /y| relate to 
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statistical uncertainty. The latter terms are defined as in (8.8) and (8.9), and vanish 
as n — > oo. 

It is possible to replace the distribution F{u) = Xa-=i Pk 1 (5® 5 m) of 0 
by a smooth version of it. The substitution principle suggests to smooth F(u) by 
convolving it with a continuous distribution G with probability mass concentrated at 
the origin of W 1 ([16], Sect. 4.5). For d = 1 the smooth version of F has the form 

/ m 

G(u - v)dF(y) = 22 PkG(u - w®). (8.15) 

k= l 


If G is concentrated at 0, the discrepancy \F(u) — ,F sm ooth(M)l between the dis- 
tributions F and F sm0 oth is small. If G is differentiable, Smooth has a density 
/smooth(w) = XitLi Pkg(u — «®), where g denotes the derivative of G. 

We conclude this section with two examples. The first example solves (8.1) with 


A = 


X! + X 2 -X 2 0 

-x 2 X 2 + a 3 -a 3 
0 -a 3 a 3 


(8.16) 


by SROMs, where X, = F~ l o 0 (&’,). F is a Gamma distribution with shift, shape, 
and decay parameters a — 1, k = 2, and )j = 3. and {G, } are standard Gaussian 
variables with E[GiGj] = i, j — 1, 2, 3. The right side B of (8.1) is the unit 

vector in R 3 . The second example solves the eigenvalue problem AU = AU. 

Example 8.8 Let U e R 3 be the solution of (8.1) with A in (8.16) and B the unit 
vector in R 3 . A SROM X = (X\, X 2 , A3) of X = (X 1 , X 2 , A3) with dimension 
m = 10 has been constructed. The first three columns and the last column of matrix a 
in (8.17) give the samples (jc® , it®, x®), k = 1, . . . , m, of A and the probabilities 
p k: k — 1 , ,m, respectively, for p = 0.9. The first sample of A has been selected 
to coincide with the smallest values of A j, j = 1, 2, 3. 


1.0000 1.0000 1.0000 0.0482 
2.0110 1.8888 2.1573 0.0969 
1.8569 1.5090 1.2753 0.0603 
1.4923 1.3309 1.7060 0.1146 
1.2891 1.4006 1.4669 0.2010 
1.3987 1.1906 1.1748 0.0965 
2.2620 2.1742 2.2356 0.0643 
2.5874 2.7606 2.1229 0.1351 
1.7373 1.7505 1.6375 0.1831 
2.3473 2.1110 1.8373 0.0001 


The samples (fi (1 \ . . . , u {m 1 ) of the SROM U have been obtained by solving ten 
deterministic problems (8.1) with A = it®. Moments of any order r of the coordi- 
nates of U can be calculated simply from E[U r A = (m® The moments 
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< 


•< 




A A 


Fig. 8.3 Distributions of the eigenvalues of A based on a SROM for X with m = 20 (solid lines ) 
and Monte Carlo simulation (dotted lines) for p = 0.2 (left panel) and p = 0.9 (right panel) 


of the coordinates of U up to order 4 are in error by less than 8% relative to Monte 
Carlo estimates of these moments based on 1000 independent samples of X. O 

Example 8.9 Random eigenvalue problems can be solved by SROMs is a similar 
manner. First, a SROM X with parameters (it®, p k ), k = \ .... ,m. is constructed 
for X = (X[, X 2 , X$). Second, the eigenvalues {A.®}, j = 1,2,3, of A with 
X = x® , k = 1, . . . , m, are calculated, so that m distinct deterministic eigenvalue 
problems need to be solved. Third, SROMs are assembled for the random eigenvalues 
{A /} of A from the eigenvalues }, j = 1,2, 3, and their probabilities p k , 
k = 1 , ... ,m. 

The dotted lines in Fig. 8.3 are Monte Carlo estimates of the distributions of the 
eigenvalues {Aj} obtained from 1000 independent samples of A. The solid lines are 
the distributions of the eigenvalues {Aj} based on a SROM X with m = 20 samples. 
The left and right panels in the figure are for correlation coefficients p = 0.2 and 
p = 0.9, respectively. The maximum error of the first four moments of {Aj} with 
respect to corresponding Monte Carlo estimates based on 1000 independent samples 
are under 5.38% for m = 10 and 3.97% for m = 20 if p — 0.2 and under 4.47% for 
m = 10 and 4. 1 1% for m = 20 if p = 0.9. O 


8.2.3.2 Extended Stochastic Reduced Order Models (ESROMs) 

Technical details on extended stochastic reduced order models (ESROMs) can be 
found in Sect. A.4. The ESROM-based solution for SAEs is based on the piecewise 
linear representation 


m 

U L (X ) = £ [u k + Vu k ■ (X - x k )] l(X e r k ) 
k=\ 


(8.18) 
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of mapping X \ —>■ U, where {u k } are solutions of (8.1) for X replaced with the 
samples {x^} of aSROM X of X, {Vu k = (3(7 /dx\, . . . , dU/dxd x )} denote gradients 
of (7 at {X = x k ). and I\ are the cells of a Voronoi tessellation in /’ = X (Q) with 
centers {.t* } ■ Note that [A k — A -1 (I\)} is a measurable partition of Q . 

Properties of Ul can be calculated simply from its expression by Monte Carlo sim- 
ulation. For example the moment of order q > 1 and the distribution of a coordinate 
Ulj, j = 1, . . . , d, of Ul can be estimated from 


m r I 

* 21 ("*.>+“■* A’ 

, , ' L n k _ r-, 

k= 1 XieTic 

m ji r i 

P(U L ,j < u) ~ Y — — Y \(u k + Wkij < 


and 


xier k 


(8.19) 


where u, t, / and w^j denote the j th coordinate of u k and Xu k ■ (x,- — x k ), {at/ } are n 
independent samples of X, and nk denotes the number of samples {x,- } in /).. The esti- 
mates in (8.19) follow from properties of conditional expectation. For h : W lx — > R 
measurable, we have E[1i(Ul(X ))] = E{E[h(JJL(X)) \ Sf]} and E[h(U l(X)) \ 

= Till [d /P(A k )) f Ak h(U L (X))dP] l Ak , where {A k = A” 1 (A)} is a mea- 
surable partition of Z. 2 and Q = cr(Ai, . . . , A m ) (Sect. 2.8). Hence, E[h{U l{X))\ can 
be viewed as a sum of local averages {(1 /P(Ak)) j Ak h(UL(X)) dP] weighted by 
probabilities { P (A k )}. In (8. 19), these local averages and probabilities are estimated 
by {(1 /nk) r k h(U L {xi )) } and {n k /n}, respectively. 

In the remainder of this section, we construct bounds on the discrepancy between 
first and higher order approximations, Ul(X) and U q (X), of (7(A) and (7(A), and 
solve a random eigenvalue problem and a SAE by ESROMs. The bounds result 
from properties of the Taylor series. It is assumed that (7 (A) is a real-valued ran- 
dom variable. The extension to vector- valued solutions U (A) poses no conceptual 
difficulty. 

Theorem 8.5 If d x = 1, U (A) has q > 1 continuous derivatives in F a.s., (7 (?+1) 
exists in E a.s., and X has finite moments of order q. then 


m m 

E[\U(X) - Ug( A)|] < Y P(Ak) (q f ly E[\X - X k \ q+1 I A k \ 

M m 

< (( ^ ? 1)! Y p ( A k)E[\X - X k \ q+l I A,] (8.20) 


where 


U q (X) = Y J HX e r k ) Y — X-xkY 

k= 1 L r=0 r - 


( 8 . 21 ) 


I [/(?+!) (x) | < Mq ,k a.s. in r k , 0 < M q k < oo, and M q > M qk for k = \, ... ,m. 
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Proof The Taylor expansion of order q of U (x) in // about x k is U (x) = U q (x) + 
Zfc=i K* e r k )R q (c k ,x ), where R q (c k , x) = [U^+ l \c k )/(q + 1)!](jc - x k ) q+l 
is the remainder and c k e r k ([4], Theorem 21.1). We have 


E[\U(X)-U g (X)\] = Y j P(A k )E 


U^ +1 \c k ) 


L (? + l)! 


C X-x k ) q+l I A k 


^X P ^J^ E [\ X ~ x k\ q+ 1 \A k \, 


which yields (8.20). The terms P(A k )E[\X — x k \ q+x \ A k ] = J r ^ \x — x k \ q+x dF(x) 
in (8.20) can be calculated by numerical integration or Monte Carlo simulation, 
where F denotes the distribution of X. ▲ 

Example 8.10 The bound in (8.20) can be used to assess the accuracy of, for 
example, the piecewise linear and quadratic approximations UifiX) — U\ (X) and 
Uq(X) = U 2 (X). These bounds for the mapping X i -»•£/= 1.25 — (X — 0.5) 2 , X 
a real-valued Beta random variable with range r = X (J2) = [a, /;], { I\ = 
[(& — \) Ax , k Ax)} , Ax — ( b — a)/m , shape parameters (^,q),a = 1/2, b = 
4, £ = 1 /2, and q = 3 are 


A/f m r 

E[\U{X) - Ul(X) I] < X / k- h\ 2 dF{x) and 

k=\ Jr k 
A/f m r 

£[|t/(X) - U Q (X) I] < -f Y. / I* “ ^l 3 dF ( x) ’ 

k=\ Jr * 


where F denotes the distribution of X, M\ =2, and Mi — 0. The discrepancy E\\U{X) 
Uq(X) |] is zero since the mapping X U (X) is quadratic. <> 

Theorem 8.6 If d > 1, U (X) has continuous partial derivatives of order q > I 
in r a.s., the partial derivatives of order q + 1 exists in r a.s., and X has finite 
moments of order q, then 


E[\U(X) - U q (X)\] < -A-^M q , k P(A k )E 
d+d 


where 


X-x k \\ q+d 

\X-c k \\ d -' 


A k 


z ( 9 , n^ 


ji di- 


7=1 


( 8 . 22 ) 


U q (X) = Y,HX e T k ) 

k = 1 



(xj - h,j) qi 

dj'- 


\ a«i+- 

+qd u(x k y 

) K 

■■■dxf . 


(8.23) 


' s £} q) = q d >0 q\-\ f q d=q i c k ! ‘ S ' a point on the open interval with ends x k 

andx,Y} Sq) = 'Lqi,... 1 q d >0 ■,q 1 +-+q d <q’ 1 9 9+1 U (x)/ (dx qi • • • dxf +1 ■ ■ ■ dx qd ) I < 
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Mq,k for all x 6 /&, 0 < M qk < oo, and {x /} and {* k j] are the coordinates of 
x and x k . 

Proof The multivariate Taylor formula at x e R r/ at expansion point x k has the form 
U (*) = Uq(x) + Ylk= i K* e r k )R q (ck, x), where 


Rq(c k ,x) 


zf = 1 «.-iic* -2*1111* -i*ir[z (,) (n-=i %) 


311+ +1dU(c k ) 


r li+' 


■9xj 


(q 


N(d) Z-i=lW T 1 t|| A _ fl ,|| 
II* — X k \\‘ ,+d ^ 


JlZzlilL v^) ! i 1 ?+ rf_1 


\\X-Xk 
i( q ) 


d 

if 


Ml*-** II' 

d“ +1 U(c k ) 


dx 


■ dx 


qi + 1 


■3r 


denotes the remainder for the Taylor approximation of order q, {aj} are the cosines 
of direction x — jc*, and N(d) gives the number of terms in ■ For x e r k , the 
remainder can be bounded by 


\Rq(C k ,X)\ < 


Mq, k d \\x — X k \\^ +d T-r 

q+d II* — Q:|| rf-1 


1 

qj\ 


since | a,- 1 < 1 . This bound on the local error of the approximation U Cj of order q of 
U and E[\U(X) - U q (X)\] = ZLl P(A k )E[R q (c k , X) \ A k \ yield (8.22). ▲ 

For the special case d x = 1, the bound in (8.22) takes the form 

1 1 

E[\U(X)-U q (X)\] < —^Y^Mq+PlA^El \X-~x k \ q+{ I A k \- 

<T^ L 7T,fl P ( A k) E [\X-x k \ q+1 I A k \, 

(q + D! 

with M q > M qk , and coincides with the bound in (8.20). 

Theorem 8.7 The discrepancy E [ \ U (X) — U q (X) |] can be made as small as desired 
by refining the partition (T)-} of T. 

Proof Consider a sequence of partitions {r k } of /’ whose diameter decreases with 
m, thatis, maxi<^< m iX ' iX "er k \\x' — x”\\ ~ 0(e m ), e m > 0,ande m -* Oasm — >■ 00 . 
Since P(A k )E[\\X ~ x k \\^ +d /\\X - c k \\ d ~ l \ A k ] ~ 0(4, +1 ), we have E[\U(X) - 
U q {X)\\ —*■ 0 as m -> 00 by (8.20). A 

Similar results hold for functions h (U) of U under some conditions, for example, 
functions h : R — > JR that are measurable and Lipschitz continuous. Results in 
Theorems 8. 5-8. 7 extend to vector-valued solutions \J . 
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We conclude the section with two examples involving solutions of a random 
eigenvalue problem AU — AU and a stochastic algebraic equation AU — B. The 
random matrix A in both problems is 


A = 


Xi + x 2 -x 2 o 
-X 2 x 2 + x 3 -X 3 
0 -x 3 x 3 


(8.24) 


where X, = F 1 o 0(6’, ), F is a Beta distribution with range [a, b ] and shape para- 
meters (£, rj), Gj ~ N(Q, 1), E[GjGj] = p^‘~^, i, j = 1, 2, 3, and p e (— 1, 1). 

Example8.11 Let X = (Xi, X 2 , X 3 ) be random vector collecting the random entries 
of matrix A in (8.24) and X a SROM for X with parameters \x k , Pk ! . k = 1, . . . , m. 
The defining parameters of the corresponding SROMs for the eigenvalues {A,(X)} 
of A are {X kk = A,(x k ), p k }. 

The piecewise linear approximations in (8.18) for the eigenvalues A, ( X ) of A are 


AlAX) = X hk + Y,tfk( X r - Xk,r) 


UX e r k ), (8.25) 


where X iik = Ai(x k ), 


,(r)_ dAj(X) 
l ’ k dX r 


l(X=«) = 


(r),«-t_|_ , ir) . (r) 

C 1 ,k A i.k H b C n^\,k A ‘X + c „,k 

nX'j ^ + (n — l)ci yt^-"^ + • • • + c n _ \ k 


k = 1, . . . , m, 


(8.26) 

{<:,-,*} are the coefficients {C, } of the characteristic equation det(A — AI ) = A " + 
Ci A " -1 + ■ • • + C„_i A + C„ = 0 for X = x k , = dCi(X)/dX r at X = x k , {X r } 
are the coordinates of X , and {x k , r } denote the coordinates of x k ([11], Sect. 8. 3. 2. 3). 
The samples {x k } of X are the centers of the Voronoi tessellation { / ). } in F. Similar 
approximations can be constructed for the eigenvectors of A ([11], Sect. 8. 3. 2. 3). 

The heavy solid lines in Figs. 8.4 and 8.5 are Monte Carlo estimates for the dis- 
tributions Fj(X) = P(Ai < /.), i = 1, 2, 3, of the eigenvalues of A based on 1000 
independent samples of this matrix. The thin solid and heavy dotted lines are SROM- 
and ESROM-based solutions for the distributions Fj (X). The left and right panels in 
the figures are for m = 10 and m = 20. The plots in Figs. 8.4 and 8.5 are for p = 0.2 
and p — 0.9. Some ESROM-based solutions are not visible since they coincide with 
Monte Carlo estimates of F t (A) at the figure scale. 

The accuracy of the SROM- and ESROM-based solutions improves significantly 
as the size of X increases from m = 10 to m — 20, in agreement with Theo- 
rem 8.7. Superior approximations result for p = 0.9 since a random vector with 
strongly correlated coordinates has less uncertainty in the sense that, given a coor- 
dinate of this vector, the uncertainty in its unspecified coordinates is reduced sig- 
nificantly. ESROM-based solutions are superior since they use representations of 
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Fig. 8.4 Distributions F; (A) (heavy solid lines) and approximations of these distributions by SROM 
(thin solid lines) and ESROM ( heavy dotted lines ) for p = 0.2, m = 10 (left panel), and m = 20 
(right panel) 



A 


A 


Fig. 8.5 Distributions F; (A) (heavy solid lines) and approximations of these distributions by SROM 
(thin solid lines) and ESROM (heavy dotted lines) for p = 0.9, m = 10 (left panel), and m = 20 
(right panel) 


mapping It -+ A that are more accurate than those for SROM-based solutions. Note 
also that SROMs U for U cannot provide any information on U outside their range 
[min i< k < m (U ( Zk )), max 1 < jfe < m (U (z*))] . ❖ 

Example 8.12 Let Ube the solution of AU = B with ,4 in (8.24) and B the unit vector 
in R 3 . We use the SROMs X in Example 8.1 1 to construct Ui(X). Let [jc*, pk}, k = 
1, . . . ,m, be the defining parameters of a SROM X of X, and denote by U (Jf*) the 
solution of the deterministic algebraic equation AU = B for X = x\ , k = 1 , ,m. 
The gradients of U (X) at X = satisfy the deterministic algebraic equations 


MX) 


dU(X) 
3 X r 


3 A(X) 

dx r 


U(X), 


r = 1,2,3, 


(8.27) 


for X = Xk,k = \ Note that the deterministic algebraic equations for U (xk ) 
and dU(xk)/dX r , r — 1, 2, 3, have the same operator and 
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Fig. 8.6 Monte Carlo estimates for Fj (u) ( heavy solid lines), SROM-based solutions (thin solid 
lines), and ESROM-based solutions ( heavy dotted lines) for p = 0.2, m = 10 (left panel), and 
m = 20 (right panel) 


A(x k ) = 


Xk,l+Xk, 2 -Xk, 2 0 

Xk,2 Xk , 2 ~b Xk , 3 Xk , 3 
0 -x^3 


(8.28) 


for each k = 1, . . . , in. The partial derivatives {<) A(X)/ 9 X r } result simply from the 
definition of A, for example, 


dAfxk) 

dX 2 


1 -1 0 

-1 1 0 

0 0 0 


The ESROM-based solution in (8.18) has the expression 


U L (X) = 


z 


3 

U(x k )-Y J A(xk)- 1 


k=\ L r=l 


dA(x k ) 

dX r 


U (Xk)(fX r X k ,r ) 


1(X e A). (8.29) 


Since the expression of Ul(X ) is known, its properties can be calculated efficiently 
and accurately by Monte Carlo simulation. 

Figure 8.6 shows estimates of the distributions Fj(u) — P(Ui (X) < u), i = 1, 2, 3, 
of the coordinates of U (X) obtained from 1000 independent samples of A (heavy 
solid lines). SROM- and ESROM-based solutions of these distributions are in thin 
solid and heady dotted lines for p = 0.2. Similar plots are in Fig. 8.7 for p = 0.9. 
The left and right panels are for m = 10 and in = 20. The accuracy of both SROM- 
and ESROM-based solutions increases with m and p. As in the previous example, 
ESROM-based solutions are superior. The ESROM-based solutions in Fig. 8.7 for 
m = 20 are indistinguishable from Monte Carlo estimates of k) ( u ) at the figure 
scale. <> 
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Fig. 8.7 Monte Carlo estimates for Fj (u) ( heavy solid lines), SROM-based solutions ( thin solid 
lines), and ESROM-based solutions (heavy dotted lines) for p = 0.9, m = 10 (left panel), and 
m = 20 (right panel) 


8.2.4 Stochastic Galerkin Method 

Let A = (X\, ... , X,i x ) be as previously an R' /r -valued random variable including 
the random parameters in matrices A and B of (8 . 1 ). There exists a nonlinear mapping. 


Fi(X 1 ) = Vi 
F 2 \i(X 2 | Xi) = V 2 


F dx \d z - 1 l (X dx | X dx - U ...,Xi) = V dx , (8.30) 

relating X to an d x -dimensional vector V = (V\ , . . . . Vd x ) whose coordinates 
are independent random variables uniformly distributed in (0, 1) ([23], Theo- 
rem 3.5.1). The mapping depends on the distribution F\ of A i and the distrib- 

utions Fj\j-\ i (xj | xj-\,...,x\) = J u fj\j-t i(« I xj-\, . . . , x\)du, 

j — 2, . . . , d x , where fj\j-i i(- | Xj- 1 , . . . , xi) denotes the probability density 

function of the conditional random variable X j \ (X/_ i = Xj~\, . . . , X\ — x\). 

An alternative form of (8.30) can be obtained by using the representations Vj = 
<t>(G j), j = 1 , ... ,d x , where ]G ; } are independent N (0, 1) variables. This version 
of (8.30) defines a mapping G = (G i, ... , Gd x ) i->- X = h(G ) relating A to a 
d x -dimensional vector G with independent N( 0, 1) coordinates. 

Example 8.13 Let A be a bivariate translation vector defined by A / = FJ 1 (<f>(G j )) , 
j — 1,2, where Fj, j = 1, 2, are continuous distributions, (Gi, G 2 ) are Gaussian 
variables with E[G j] = 0, £[G^] = 1, j = 1, 2, and E[G\G 2 \ = p, \p\ < 1. The 
mapping (Vj, V 2 ) !->■ (A | , A 2 ) in (8.30) is 
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Xi = F~ l (V I) 

X 2 \Xi = F~ l (0(p0- l (Fi(Xi)) + y/i - P 2 <p-*(V 2 ))). (8.31) 

This representation is not needed for translation vectors as considered in this example, 
since X = (Xi, X 2 ) is related simply to G = (Gi, G 2 ) in this case. O 

Proof The formulas in (8.31) result from Xj — FJ x (fP(Gj)),j — 1, 2, the property 

G 2 I G\ = pG\ + — p 2 N(0, 1) of Gaussian variables, and the relationship 

between N( 0, 1) and U (0, 1) variables. A 

The solution U of (8.1) with X — h(G) given by a version of (8.30) relating X to G 
can be viewed as an unknown function of G, where G = (G 1 , . . . , G\/ t ) is a Gaussian 
vector with independent standard normal coordinates. Hence, the coordinates of both 
X and U can be represented by series of Hermite polynomials of G, referred to 
as polynomial chaos expansions. For example, the coordinate U, of U admits the 
representation 

00 d x j 

«=z z n 7 

n=0 n\+m+—=n k=l ** k 

00 

n=0 n\-\-ri 2 -\ — =n 


(8.32) 

( 22 ) 


where , n2 ,...(G) = f|/ H„ k (Gk)/VniJ-, n , t > 0 are integers, { //„, ) denote Her- 
mite polynomials, {af ni } are unknown deterministic coefficients, and { GT } are 
the coordinates of G. The equality in (8.32) holds in the mean square sense ([10], 
Sect. 3.3.6, [17], Sect. 9.5, and Sect. B. 6.1 of this book). Note that the coordinates of 
X admit representations as in (8.32) and that their coefficients can be calculated from 
the probability law of X, which is known. 

For numerical calculations the infinite series in (8.32) needs to be truncated to a 
finite number of terms, so that the coordinates t/; of U are approximated by 


n 

= X X <,f !2 ,..>"i.H2....(G), i = (8.33) 

n = 0 ni+ri 2 +—=n 

where h > 1 is an integer. The representation of U in (8.33) is used to construct the 
stochastic Galerkin solution for (8.1) ([10], Sect. 3.2.3). Similarly, the coordinates 
Xj of X can be approximated by finite sums as in (8.33), that are denoted by Xj . Note 
that the meaning of X and U in this section differs from that in the previous section. 

The construction of stochastic Galerkin solutions is based on the following facts, 
that are discussed in Appendix B and summarized here for convenience. The space 
L 2 (£2, cP , P ) admits the decomposition L 2 (Q, ■'P, P) = (&fL {] K n , where the mem- 
bers of subspaces K n are called homogeneous chaoses of order n (Theorem B.73 
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and Definition B.44). Hence, any member 0 of l?(Q , fF , P) can be given in the 
form 0 — (/>n, where 0„ e K n (Theorem B.73). The collection of functions 

n k > 0, n\ + ri 2 + • • • = «} is an orthonormal basis for K n (Theorem 
B.73), where dff n \,n 2 ,... = 11^=1 H nk ( Gk)/s/nif and {Gk} are independent N(0, 1) 
variables. The functions {J^ 1: „ 2i ..., «i+«2 + - ■ ■ = n, n = 0, 1, . . .} define an ortho- 
normal basis for the Hilbert space L 2 (L2, , ( P B , P ) so that every tp e L 2 (L2, .'P B . P) 
admits the representation 0 = Z^=o S« 1 +n 2 +-=« a m, n 2 ,..., where 

n\, n, 2 , . . . > 0 and a ni „ 2i = E[ipJif ni ri2 ] (Theorem B.75). 

The stochastic Galerkin method assumes that both the random elements of (8.1) 
and the solution of this equation can be represented satisfactorily by members of a 
subspace = ®Jj =0 K n of Lr(L2 , JF, P). For example, Uj in (8.33) is a member 
of Wfi = ®" =0 K n . The notations in this section are commonly used in applications, 
and differ slightly from those in Appendix B . 

The stochastic Galerkin method gives weak solutions for approximate versions of 
AU = B in (8.1) obtained by representing the random elements in the definition of 
this equation by members of #), . Let (A, B) be the matrices (A, B) with X in place 
of X, where X denotes the projection of A on Wr, . Conditions for the existence and 
uniqueness of the weak solution of A U = B can be establish by arguments similar 
to those in [1] used to construct weak solutions for stochastic partial differential 
equation. 

Definition 8.2 The weak solution U e "//p of A 11 = B is given by 

J(H, W) = E[B'W*], VW e W„, (8.34) 

where JW , W) = E[(AU)'W*]. 

Theorem 8.8 Let ( A , B) be (A, B) with X in place ofX. If & is a bounded, positive 
definite bilinear form, the weak form 

J(f/, W) = E[B'W*], VW e Wn, (8.35) 

of AU = B has a unique solution. 

Proof Apply the Lax-Milgram theorem (Theorem B.44). ▲ 

Theorem 8. 1 can be used to bound the discrepancy between weak solutions of 
AU — B and AU = B, that is, the solutions of P§(U, W) = E[B'W*~\, and 
J(C, W) = E[B'W*] for all W e W n . 

Theorem 8.9 Let U be an W 2 -valued random variable with coordinates Ui e '#(, 
given by (8.33). The unknown coefficients in the expression of U can be calculated 
from the system of equations Zs [A[/0 ;ii ,, !9i ...(G)] = £[B0„ lin2 ,...(G)], where nk > 
0, «i + «2 + • • • = n, n = 0, 1, ... ,h, and (A, B) are approximations of (A, B) 
obtained by projecting the random elements in these matrices on Wn ■ 

Proof If U eWnisa. weak solution for (8.1), it satisfies E[(AU)’W *] = E[B'W *] 
which gives /=1 E[Ajj Uj W*] = Xf=i E[Bj W*] for trial vectors W e Wn with 
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coordinates Wj = X"=o Z ni +„ 2 +...=„ d h l ,n 2 ,...fn l ,n 2 ,..AG). An equivalent form of 
this condition is 

z £<u_( 

/l=0«l+»2+-='i 1=1 ' 


X E ^U Uj *n 1 ,n 2 ,...(.G)]-E[B i (G)]j 


i 


which yields the stated equations for the coefficients in the expression of U since 
{d' n j ni 1 are arbitrary. Note that the equations defining the coefficients in the expres- 
sion off/ constitute projections of A U = B on where (A, B) are (A , B) 

with X replaced with X. ▲ 

The construction of the polynomial chaos representation U in (8.33) can pose 
notable difficulties since the dimension of A is large in realistic applications, h needs 
to be relatively large for solution accuracy, and the equations satisfied by the coeffi- 
cients in the expression of U are coupled. 

Example 8.14 Consider (8.1) with d = 1, A = exp(X), X ~ N( 0, 1), and B = 1. 
The exact solution of this equation is U — exp(— X). The approximate solution given 
by (8.33) has the expression (B.50) 


" 1 

U = Y J ~/=a n H n (X), 


(8.36) 


depending on the coefficients («o, a i, . ... a;, ) that can be calculated from a linear 
system with h + 1 equations. We have «o = 1.1031, a\ = — 0.3927 , 02 = —0.3126, 
and (3:3 = 0.2883 for n = 3 and aq = 1.1031, a\ = 1.5444, a 2 = —1.3195, 
<23 = 0.5655, A4 = —0.2063 and 05 = 0.0987 for n = 5. Figure 8.8 shows with 
dashed and solid lines the exact mapping x 1 — > u = exp(— .r) and the approximate 
mappings x m- u = Z«=o a nH n {x) / \fn\ given by (8.36) with h = 3 (left panel) 
and h = 5 (right panel). The approximate mapping improves with h. Yet, there are 
notable differences between the exact mapping and its approximations. The mean 
and standard deviation of U are 1.1035 and 0.5802 for h — 3 and 1.5459 and 1.4539 
for h = 5. For n — 5, the mean and standard deviation of U are in error by —6.4% 
and— 33.3%, respectively. <> 


Proof The lognormal variable A has the representation A = e l ^ 2 H m (X)/m\ 

(Example B.44), where H m are Hermite polynomials defined by (B.46). The first few 
Hermite polynomials are Hq(x) = 1, H i(x) = x, H^ix) = x 2 — l, IPfx) = x 3 — 3x, 
H 4 (x) = x 4 — 6 x 2 + 3, and 11=, (x) = x 5 — 10x 3 + 15x. The solution U can be 
expressed as U — Z«^=o a " (A) /VnT. where the equality holds in the mean square 
sense. The coefficients a n of a truncated version U — ZZo a » ( X)/y/n \ of U can 
be calculated from (Theorem 8.9) 


1 1 

T ~^a n V —E[H m (X)H n (X)H r (X)] 
n \/n\ n m - 

n = 0 m — 0 


= e~ 1/2 E[H r (X)], 


r — 0, 1 n, 

(8.37) 
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Fig. 8.8 Mappings x i-> u = exp(— x) ( dotted lines) and x t- s- u ( solid lines ) with n = 3 ( left 
panel) and h = 5 ( right panel) 


a coupled system of n + 1 equations for the unknown coefficients (ao, a \ . . ... ay,), k 

The stochastic Galerkin method is particularly attractive for applications since (1) 
the approximate solutions U are represented by finite sums of Hermite polynomials 
of Gaussian variables with unknown deterministic coefficients, (2) the unknown 
coefficients in the expression of U satisfy deterministic equations, and (3) properties 
of U can be estimated efficiently from the expression of U and samples of Gaussian 
variables since the expression of U is available. 

However, the method has notable limitations related primarily to computation 
demand and accuracy. The computation effort needed to construct a Galerkin solution 
U increases rapidly with the order h of the approximation and/or the size of X. The 
number of terms in the representation given by (8.33) for each coordinate of U is equal 
to (d x + n)\/(d x \h\), for example, this number is (1 + n)\/h\ — 1 + h for d x — 1 in 
agreement with (8.36). Also, as previously mentioned, the equations for the unknown 
coefficients of the polynomial chaos representation for U are coupled, so that efficient 
and robust solvers are needed for calculating these coefficients. Since d x is usually 
large in applications, the order n needs to be kept small. This constraint may result 
in unsatisfactory approximations, as illustrated by Example 8.14. Moreover, even if 
X admits a polynomial chaos representation with finite order, the Galerkin solution 
U may not belong to the space spanned by the polynomial chaoses representing X, 
so that its accuracy may not be satisfactory. 

It has also been shown that high expansion order are required when the depen- 
dence of the solution U on the input random parameters X is not smooth [28]. 
A related matter is the convergence of the polynomial chaos representations. Under 
the assumption that U — A " 1 B is in L , the sequence of polynomial chaos repre- 
sentations U converges in m.s. to U as h — > oo, so that it also converges in both 
probability and distribution. This convergence does not imply that moments of order 3 
and higher of U can be approximated by corresponding moments of U , h < oo, since 
the sequence of higher order moments of U may diverge ([9] and Example B.45). 
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The accuracy of stochastic Galerkin solutions can also be affected by other 
factors, such as the type of polynomial used to approximate X and U. It has been 
shown that the accuracy of the Galerkin solution can be improved by using Wiener- 
Askey polynomials of various types depending on the probability law of X. The 
improvement relative to Hermite polynomials of Gaussian variables can be signif- 
icant when dealing with random variables taking values in bounded intervals [29] . 
This observation can yield efficient and accurate polynomial chaos representations 
for the input random vector X but it may be of limited value for the solution U of 
(8.1), since the probability law and the range of U are unknown. 

Example 8.15 Consider the stochastic algebraic equation AU = 1 in Example 8.5 
with A — X and X uniformly distributed in [a. b], 0 < a < b < o o. It has been 
shown that Lagrange, rather than Hermite, polynomials provide an optimal repre- 
sentation for X [29]. However, representations of solutions using polynomials that 
are optimal for input parameters may not be optimal for solutions, as illustrated by 
the following two figures. The solid and dotted lines in Fig. 8.9 are the exact and the 
approximate mappings from v to u and v to polynomial chaos approximations u of 
u based on Legendre polynomials up to degree 4, where V ~ U{— 1, 1) and X = 
(a + b)/2+(b — a)V/2. Results in the left and the right panels are for (a , b) = (1, 3) 
and ( a , b) = (0.01, 3), respectively. The mappings in the left panel are indistinguish- 
able at the scale of the figure, but differ significantly in the right panel. The density 
of U and histograms of U are shown in Fig. 8.10 for (a, b) = (1, 3) (left panel) 
and (a, b) = (0.01, 3) (right panel). The histogram of U matches accurately and 
differs significantly from the density of U for (a, b) = (1,3) and (a, b) = (0.01,3), 
respectively. The mean and the standard deviation of U nearly coincide with corre- 
sponding moments of U for (a, b) = (1,3) but for (a. b) = (0.01, 3) are in error 
by —23.19% and —64.35%. These numerical results are consistent with the quality 
of the mapping v i->- u in Fig. 8.9, and relate to differences between the support 
and the shape of the distributions of X and U. For (a, b ) = (0.01, 3), the mapping 
v i — > u is inaccurate, for example, the range of U is [ 1 /b. I /a] = [1/3, 100] while, 
the range on U is [0.2374, 9.2797] for n = 4, so that U and U cannot have similar 
distributions. Also, Legendre polynomial representations are not optimal for U since 
its distribution differs significantly from a uniform distribution, so that approximate 
solutions based on these polynomials are not likely to be satisfactory. O 

Proof Legendre polynomials are defined by 0;(v) = <7'((v + l)'(v — 1 )')/dv', 

rl 

/ =0, 1 ve (-1, 1), are orthogonal, that is, J_ j (pi (v)(p j (v) dv = 0 for i ^ j, 

and have norm \\(pj || 2 = f_^ l <pi(v) 2 dv/ 2 = (i\) 2 2 2 ‘ /(2i + 1) ([22], Chap. 12). Note 
that X = (a + b)/2 + (b — a)V / 2 = [(a + b)/2](po(V) + [( b — a)/A](p\{V), where 
v is uniformly distributed in (— 1 , 1). Consider the approximate representation U = 
X;Lo Pi4>i(y) of U. The unknown coefficients {/3, } can be obtained by projecting 
the approximate version AU = 1 of AU = 1 on fa, k = 0, 1, . . . , h, which gives 
the system of linear equations 



Fig. 8.9 Mappings v i -> u {solid lines) and v i -> u ( dotted lines) for A = X uniformly distributed 
in (a, b), where ( a , b) = (1, 3) {left panel), and (a, b) = (0.01, 3) {right panel) 



U U 

Fig. 8.10 Density of U and histograms of U for {a , b) = ( 1 , 3) {left panel) and {a, b) = (0.01,3) 
{right panel) 


" / Q Jy b Cl \ 

X ( ^y i £[0i(V , )#(V)]+^£[V0 / (V)0 A .(l/)] ) Pi = E[&(V)]. k = 0, 1, ... , n, 
1=0 ' ' 

for (fa, . . . , ftp). Resulting values of these coefficients and the representation 
U = X/=o fti4 s i(V) can be used in conjunction with Monte Carlo simulation to 
find statistics of U approximately. A 


8.2.5 Stochastic Collocation Method 

As for Monte Carlo simulation and SROMs, the implementation of the stochastic 
collocation method involves solutions of deterministic versions of (8.1) for specified 
values of X. These solutions exist if A has an inverse a.s., so that we assume that A - 1 
exists with probability 1. Let (x (0 \ x <l \ ... , x <n} ) be (n + 1) distinct points in the 
range r = X (Q) of the random vector X collecting the random parameters in the 
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definition of matrices A and B in (8.1), referred to as collocation points. Consider the 
approximation 

n 

U = ^u (i) li( x ) (8.38) 

i= 0 

for U, where u il> are solutions of (8.1) for X set equal to x {,} and lj ( X ) are interpolat- 
ing polynomials centered at x^\ i = 0,1, , n . Note that the accuracy of U depends 
on the number and the location of the collocation points (x®, x^\ . . . , x <Jl> ). the 
type of interpolating polynomials, the properties of mapping X i -»•£/, and the prob- 
ability law of X. Properties of U can be obtained simply and efficiently from its 
expression by, for example, Monte Carlo simulation since the mapping X i->- U is 
available analytically. The collocation method can be viewed as a response surface 
that approximates the mapping X h -+ U from deterministic solutions of (8.1). Note 
that U in (8.38) and U in previous sections have different meaning. 

If the mapping X i->- U defined by (8.1) is continuous, then it is possible to 
construct a polynomial approximation U for U that has a specified accuracy, as 
indicated by the following theorem stated for d = 1 . 

Theorem 8.10 For g e C [a , b] and e > 0, there exists an algebraic polynomial 
p such that ||g — pHoo = ma x a < x <b I g(x) — p(x)\ < e ([22], Theorem 6.1, the 
Weierstrass theorem). 

Example 8.16 Consider the stochastic algebraic equation in Example 8.5 and let 
uhl = l/x^ be the solution of AU = lforA = X set equal to. = a+i(b—a)/n, 
i =0, 1, . . . , n. The approximation U of U given by (8.38) is 

n . n v (j\ n , 

x - 1 j — r X—X yl> X - 1 

£/=> II -Ft tt-=> £;(X), 

■^— l a + i(b — a)/n x 0) _ x (j) a + i(b — a)/n 

i=o ;=o.;Ai >=o 

(8.39) 

where £,(X) = ]~[/=o j^i ~ x^)/(x^ — x^) are Lagrange interpolating poly- 
nomials. The approximate solution U matches U exactly at the collocation points 
since lj (x <J> ) = 8jj . Moreover, for every s > 0, there is a polynomial approximation 
U such that E[\U — U\ q ] < e q , where q > 1 is an integer. O 

Proof The mapping x i— ► u(x) = 1/x is continuous for x f 0, so that there is a 
polynomial u(x) such that max ( ,< A -</, u (x ) — u(x)\ < e (Theorem 8.10). Accord- 
ingly, E[\U — U\ q ] < E[e q ] = s q . A 

Theorem 8.11 Let g : [a, b] — > K and p be as in Theorem 8.10, X a real-valued 
random variable with support [a, b], and q > 0 an integer. There exists a constant 
c > 0 such that 

|£[g(20 9+1 ] - E[p(X) q+l ] | < c\\g - plloo- (8.40) 

Proof The identity a q+1 - f q+l = (a-f) 'Zl =0 <x q ~ k P k apphedto £[g(Z)? +1 ]- 
E[p(X) q+l ] = E[g(X) q+1 - p(X) q+l ] gives 
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\E[ g (x)“ +l - P (x)“ +l ]\ < e \ g (x) - pm^gwY'-b PW k 

k = 0 


IL? - pWooE 


Y^g{X)«- k p{X) k 


= c||g - P I 


where c = £[| ^|_ 0 8(X)‘ I ~ k p(X ) k |] > 0 is a finite constant since Xhas bounded 
support and g and p are continuous in [a, b] and, therefore, bounded in this interval. 
The bound in (8.40) shows that moments of g(X) can be approximated by corre- 
sponding moments of p(X) if ||g — p\\oo is sufficiently small. ▲ 

Theorem 8.12 Let g : [a,b] — >■ R be a continuous function and p a polynomial 
of degree n satisfying the interpolation conditions p(x = g(x^), where 
i = 0, 1 are distinct points in [o, b]. If g e C^' !+1 '[a, b], the error e n (x) = 
g(x) - p(x) is 


1 

e„(x) = — - ruTT^ £ e [a,b], (8.41) 

(n + 1)! 


and limH-^oo e n (x) = 0 for all x e [a, b]. 

Proof For proof, see [22] (Sect. 4.2). We only show the convergence e n (x) — > 0 
as n — > oo. Since is continuous by assumption, M — ma \ a < x <b |g ( " +1) MI 

is finite, so that \e„(x)\ < (b — a) n+[ M / (n + 1)! by (8.41). Let n* be such that 
n* < b — a < n* + 1 so that, for n > «*, 


\e n (x)\ < 


(b — a) n * M 
Jn*)\ 


n 


n 

k = n *-\- 1 


b — a 
k+l' 


so that \e„(x)\ — > 0 as n — > oo. A 

Example 8.17 Let p(X) — X/Li Q /x (l> )li(X) be a polynomial approximation for 
random variable g(X) = 1 / X . where and l ,• are as in (8.39) and X is uniformly 
distributed in [a, b], 0 < a < b < oo. The discrepancy between moments of g(X) 
and p(X) decreases with the degree n of p(X) in agreement with Theorem 8.11. 
For example, if a = 0.1 and b = 1, the errors of E[p(X) 6 ] relative to Zs[g(X) 6 ] is 
29.18%, 2.84%, 0.11%, and 0.07% for n = 5, 10, 20, and 50, respectively. These 
errors are based on estimates of E[p(X) 6 ] obtained from 10 6 independent samples 
of X. O 

A particularly useful class of interpolation polynomials are the Bernstein poly- 
nomials, that have been originally introduced to provide an alternative proof of the 
Weierstrass theorem ([19], Chap. 1). 
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Definition 8.3 For a real- valued function g defined on the closed interval [0, 1], 

n | n 

B n (x) = y' gji/n) — n ' x'(l -x) n ~‘ = y' i g(i/n)p„j(x), x e [0, 1], 
i=0 lAn l) ' (=0 

(8.42) 

is called the Bernstein polynomial of order n for function g. 

The polynomials p n j(x) = [n\/(i\(n — i)!)]x'(l — x)"~' in the expression of 
B n (x) are positive and satisfy the condition ]T"_q Pn,i (■*) = 1 for all x e [0, 1] . The 
latter property implies m < B n (x) < M if m < g(x) < M, Vx e [0, 1]. 

Theorem 8.13 If g : [0, 1] — > R is a bounded junction, then 1 i m ^ oo. B n (x ) = g(x) 
at each point of continuity x of g. If g is continuous in [0, 1], then lintn^oo B n (x) = 
g(x) holds uniformly in [0, 1] ([19], Theorem 1.1.1). 

The Weierstrass theorem is a direct consequence of Theorem 8.13. That g has 
support [0, 1] is not restrictive since any interval [a, b] can be mapped into [0, 1] by 
a linear mapping. For example. 


n 

B n {x\ b) = Y' g {bi/n)p n ,j(x/b), x e [0, b], (8.43) 

i '=0 

is a Bernstein polynomial for a bounded function g : [0, b] —> R. The following 
two theorems are extensions of the Bernstein polynomial representation in Theorem 
8.13 to functions defined on unbounded intervals and multivariate functions. 


Theorem 8.14 Let g : [0, oo) — > R be a bounded function and b„ ~ o(n). Then 
limn-^oo B n (x\ b n ) = g(x) at any point of continuity of g ([19], Theorem 2.3.1). 

Theorem 8.15 Let g : [0, l] dx R be a bounded function. Then the multivariate 
Bernstein polynomial 


B); | n t i f (xi , . . . , Xd x ) 




4=0 id x = 0 7 = 1 J J J 


( X j ) , x e 


(8.44) 


has the property lim ni B ni ni (x\, . . . , Xd x ) = g(x i, . . . ,xj x ) at all 

points of continuity of g. If g is continuous in [0, Y] dx , then lim,,, „ d ->00 B ni __ nd 

{x\, . . . , Xd x ) = g(xi, .... Xd x ) holds uniformly in [0, Y] dx ([14], [19], Sect. 2.9). 


This result shows that there is an analogue of the Weierstrass theorem (Theorem 
8.10) in the multidimensional case. Theorem 8.15 states that for any real-valued 
continuous function g defined on a bounded rectangle R in R' /j and for any e > 0, 
there exists an algebraic polynomial/? such that 1 1 g—p \ |oo = rnax vs ^ \g(x)— p(x)\ < 
e. Moreover, Theorem 8.11 holds in the multivariate case, that is, if X is an Re- 
valued random variable with support R, the moments of any order of g(X ) can be 
approximated by moments of corresponding order of p(X) at any desired accuracy. 
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The latter observations imply that, if U in (8.1) is a continuous function of X and 
X has a bounded range r = X (12). then U can be approximated to any degree of 
accuracy by a polynomial U of X with coefficients obtained from solutions of (8.1) 
with X set equal to collocation points selected in r — X (12), that is, the condition 
\\U — f/||oo < s can be satisfied for arbitrary s > 0. Accordingly, U converges a.s. 
to U as its degree increases indefinitely, and moments of U can be approximated by 
moments of U for a sufficiently large polynomial degree. The rate of convergence 
of U to U depends on properties of X and of the mapping X h-> U as well as the 
number and location of collocation points. 

It is not possible to find the degree of an approximating polynomial for a required 
accuracy since the mapping X i — > U is not known. Note that the construction of 
U involves f] nj solutions of distinct deterministic versions of (8.1), where nj 
denotes the number of collocation points along coordinate j = 1 , ,d x , so that 
the degree of U needs to be relatively low unless the dimension d x of X is small. For 
example, IT/Li n j = 10 100 for nj — 10 and d x = 100, so that this version of the 
method may be impractical for many applications. Alternative collocation schemes 
based on the Smolyak formula and other schemes have been proposed to reduce calcu- 
lations related to the solution of stochastic elliptic partial differential equations [21]. 
Additional information on these schemes, a priori bounds on the discrepancy between 
exact and collocation solutions of (8 . 1 ), and comparisons between stochastic Galerkin 
and collocation methods can be found elsewhere [2, 3]. 

Example 8.18 Consider the scalar stochastic algebraic equation AU — 1 in Example 
8.16 with A = X uniformly distributed in (a, b ). The approximate mapping X i->- U 
is given by (8.39). The errors in percentages between the exact and approximate 
means are 0.085%, 0.026%, and 0.0259% for (a, b) = (1, 3) and n = 4, 10, and 
20, respectively, and 347.12%, 97.12%, and 30.84% for ( a , b) — (0.01, 3) and the 
same values of n. The corresponding errors of approximate standard deviations are 
0.703%, 0.0129%, and 0.0123% for (a,b) = (1,3) and 290.99%, 128.39%, and 
58.12% for (a, b ) = (0.01, 3). Improvements result by increasing the degree of the 
interpolating polynomials. For example, the mean and standard deviation of U are 
in error by 7.39% and 19.44%, respectively, for (a, b ) = (0.01, 3) and n = 40. 

The solution of AU = 1 with A ~ [7(0.01, 3) in Example 8.5 using a SROM 
with m = 20 provides superior approximations for the mean and standard deviation 
of U. The errors of these approximations are 2.76% and —15.89%, respectively. O 


8.2.6 Reliability Method 

We have seen that (8.30) can be used to map an arbitrary non-Gaussian R^- 1 -valued 
random variable X into a Gaussian vector G = (G i, ... , Gd x ) with independent 
N( 0, 1) coordinates, so that X in the definition of (8.1) can be assumed to coincide 
with G without loss of generality. 
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A common objective in reliability studies is the calculation of the probability 

Ps = P(h(U) < 0) = P(h(U(G)) < 0) = P(G e D)= I <P(x)dx, (8.45) 

J D 

where D — {x e : g(x) < 0} denotes a safe set, g(x) = h(u(x)), h : W 1 — > M 
is a measurable function, and 0(x) = ri/Li exp(— x 2 , /T)^/2n . 

Generally, the integral J D (f>(x)dx cannot be obtained analytically. Numerical 
solutions, for example, Monte Carlo simulation or quadratures, are usually ineffi- 
cient when d x is large. Alternative solutions are offered by reliability methods. The 
simplest reliability method approximates p s by 


Ps - 4>03), (8.46) 

where 0 denotes the distribution of /V(0. 1) and fi is the minimum distance from 
the origin of R' /c to the boundary of D. Theoretical considerations regarding the 
construction of (8.46) and related approximations can be found in [8] (Chap. 9). 
Note that the approximation in (8.46) replaces the calculation of the multidimensional 
integral f D <p(x)dx in (8.45) with the solution of a constraint optimization problem 
that finds the point x* e R rf * on the boundary 3 D — {x e : g(x) = 0} of D 
that has the smallest norm. Details on optimization algorithms for finding x* and 
numerical examples can be found in [26]. 

The reliability method differs from the other methods for solving stochastic alge- 
braic equations in both focus and approach. The method has been developed to 
calculate probabilities that the solution U of (8.1) belongs to a specified subset of 
R rf . Calculation of moments of U by this method is usually inefficient. The method 
is based on asymptotic properties of multidimensional integrals of Gaussian proba- 
bility density functions [5]. Useful numerical comparisons between the accuracy of 
the stochastic Galerkin and the reliability methods can be found in [26], 


8.3 SAEs with Small Uncertainty 

Throughout this section we consider the special case of (8.1) in which B is determin- 
istic and A depends on an W lx -valued random variable X that has small uncertainty. 
The assumption that B is deterministic is not restrictive. We apply the Taylor series, 
perturbation series, Neumann series, and equivalent linearization methods to solve 
approximately this class of stochastic algebraic equations. The focus is on the first 
two moments of the solution U of (8.1). 
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8.3.1 Taylor Series 


Let p x = {p x ,p} and y x = {y XtPq }, p, q = 1, . . . , d x , denote the mean and the 
covariance matrices of X. If the mapping It -* 1/ ( X ) has continuous second order 
partial derivatives, then 

dU(u x ) 

U (X) ~ U(X) = U(p x ) + Y, —r^(Xp - UL X , P ) (8.47) 

p= 1 dxp 

by using the first two terms of the Taylor expansion of U (X) about the mean of X, 
where 3 U (p x )/dx p is a vector with coordinates dUi(p x )/dx p , i = 1, . . . , d. 

Theorem 8.16 The approximate mean and covariance matrices of U given by its 
representation in (8.47) are 


p u - U (p x ) 
d : 

Yu - 

p,q= 


A dUQi x ) (W{p x )\ 

, 3 x p V 3 X q ) yx ’ pq ’ 

KQ= 1 y *1 ' 


and depend on the first two moments ofX. 

Proof The Taylor expansion of U(X ) about x (0) e M </,: is 


U(X) = U(x (0) ) + £ dU jf 0>) (X p - 4 0) ) + R 2 (X, x (0> ), 


p= 1 


where 


(8.48) 


p,q=l r q 


and X* — 0x (O) + (1 — 9)X , 6 e (0, 1) ([4], Theorem 21.1). The approximation in 
(8.47) is the above Taylor expansion for x (iy> = ji x without the remainder R 2 . The 
error R 2 (X, p x ) = U (X) — U (X) of U (X) in (8.47) depends on properties of X and 
of mapping X i->- U (X) in a neighborhood of p x . ▲ 

Theorem 8.17 The second moment properties ofU ( X ) in (8.48) can be calculated 
from 

U (ji x ) = A(p x )~ l B and (8.49) 


dU(Px) 
3 x u 


A(p x y l 


( 


3 Mp z ) 

dx u 



(8.50) 
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Fig. 8.11 Histograms of U\ 
and U 2 




Proof The solution of (8.1) with X = /i x gives U (fix)- The derivative, 


9A(X) 

ax p 


U(X) + A(X) 


dU(X) 

dX p 


dB 

dX~ P ’ 


of (8. 1) withrespect to coordinate X p ofXgives (8. 50)forX = /i x since dB/dX p = 
Oby assumption. The rates of change dU (p, x )/dx u of U relative to the coordinates of 
X are referred to as sensitivity factors. If the sensitivity factor of U with respect to a 
coordinate X p of X and the uncertainty in X p is small relative to the other coordinates 
of X, the uncertainty in X p has a minor contribution to the uncertainty in U so that 
we can set X p = /x x ,p ■ A 

Previous observations show that approximation of U in (8.47) has the form 


u ~ u(X) = A(n x y 


p = 1 


BA(ii x ) 

dx n 


A(P'x') {X p Bx,p) 


B, (8.51) 


where I denotes the ( d , d) identity matrix. The extension to the case in which A and 
B depend on X results by similar arguments (Exercise 8.11). 

Example 8.19 Let U = (U\, Uf) be the solution of (8.1) with 


A = 

cos(0i) COS(@ 2 ) 

and B = 

'O' 


sin(0j) sin( 02 ) 




where ©\ = tan~' (^/(Xi + a)) , @2 = tan -1 ((a — X2)/(X\ + a)), X\ and Xi 
are independent random variables uniformly distributed in (— sa, so), and 0 < e < 
1 /3. This stochastic algebraic equation is the equilibrium condition for a system with 
random imperfections whose solution U = (U 1, U2) gives forces in the system. If 
e — 0, that is, the system has no imperfections, U\ = — 1 and U 2 = y/2. 

Figure 8. 11 shows histograms of U\ and U 2 obtained from 1000 independent 
realizations of the system for a = 1, e = 0.3, and q = 1. The estimated mean, 
standard deviation, skewness, and kurtosis are —1.16, 0.52, 1.23, and 4.17 for U\ 
and 1.58, 0.52, 1.18, and 3.94 for U 2 . 

The approximate means of U 1 and U 2 given by the first order Taylor represen- 
tation in (8.49) are —1 (—14%) and \/2 (—10.65%). The corresponding standard 
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deviations derived from (8.48) and (8.50) are 0.6 (14.81%) and 0.49 (—6.42%). The 
numbers in parentheses give errors relative to the Monte Carlo simulation. O 

The approximate moments of U in (8.48) can be improved by retaining additional 
terms from the Taylor series representation of the mapping X i-> U (X). Under the 
assumption that X i —>■ U ( X ) has continuous third order partial derivatives, U can be 
approximated by 


U(X) 


d x 

- U(X) = U{ i i x ) + Y J 

u = 1 


dU(Hx) 

dx u 


(X u 


/-t.V . it ) 



u,v= 1 


d 2 U(Hx) 
dx u 9x v 


(X u Px,u)(X v [Xx,v')- 


(8.53) 


The calculation of the second moment properties of U based on this approximation 
requires information on X beyond its first two moments (Exercise 8.13). 

Nonlinear stochastic algebraic equations can also be solved approximately by the 
Taylor series method. Let U be the solution of 


AU + N = B, 


(8.54) 


where A is as in (8. 1 ) and /Vis a (/-dimensional vector whose coordinates are nonlinear 
real-valued functions of (X, U). If (8.47) is used to calculate the second moment 
properties of U, we will need U(pt x ) and {dU(ix x )/dx p }. Note that U(p x ) is the 
solution of (8.54) with X = fi x and that dU(p. x )/dx p can be obtained from the 
derivatives of (8.54) with respect to the coordinates X p of X, that is, 

9A dU dN JL 9N dUi 

— u + a — + — + y = o, 

dXp 9 Xp dXp dUi dXp 

for X — fL x ■ The partial derivatives of U with respect to X p can be calculated since 
the functions X i-v A(X) and (X, U) N(X, U) are known. 


8.3.2 Perturbation Series 

LetA = A+sR in (8.1), where A — E[A] and R are (d, d) deterministic and random 
matrices and s is a small parameter. As previously, it is assumed that X and R have 
finite first two moments. Note that E[RjA = 0, i, j = 1, . . . , d, by construction. 

Theorem 8.18 The perturbation solution of( 8. 1 ) is 

U = t/ <0) + eU (1) + e 2 U (2} + 0(e 3 ), where 
U (r) = - A~ l RU (r ~ l \ r = 1,2, .... and U (0) = A~ l B, 


(8.55) 
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so that the approximate first two momen ts ofU are 

E[U] =A~ l B + e 2 A- l E[RA- l R]A~ 1 B + 0(e 3 ), 

E[UU'] =£[fy (0) (C/ (0) )'] + e 2 E[U (0 \U (2) y + U a) (U a) Y 

+ [/ (2) (t/ (0) )'] + 0(e 3 ). (8.56) 

Proof Since the random part of A is small, it is expected that the solution U does not 
differ significantly from the solution A -1 B of (8.1) with e = 0. If the perturbation 
solution in (8.55) is not singular ([15], Sect. 1.2), the representation of U in (8.55) 
holds, and (8.1) takes the form 

(A + eR)(U { 0) + sU m + s 2 U {2) + ■■•) = B or 

(Af/ (0) — B) + s(AU (1) + RU (0) ) + s 2 (AU (2) + RU (1) ) + • • • = 0. 

This power series in s is zero if Af/ (0) = B and AU^ — —RU^ r ~^ for r = 1,2,... 
by a fundamental theorem of perturbation theory ([24], p. 12). Hence, the perturbation 
solution of order p > 1 has the expression 

p 

U (p) = ^(-l) k e k (A- l R) k A~ l B, (8.57) 

k = 0 

where (A -1 R) k is the identity matrix for k — 0. Note that the equations for U (r \ 
r — 0, 1 have the same deterministic operator. A 

Example 8.20 Let A and B in (8.52) be matrices defining the stochastic algebraic 
equation in (8.1), and let A = E [ A ] and eR — A — A denote the deterministic and 
random parts of A, respectively. The approximate means of U i and U 2 based on the 
first order perturbation are —1.0130 (1.30%) and 1.4245 (0.73%) for e = 0.3. 
The corresponding standard deviations of U 1 and Lf are 0.3825 (—26.81%) and 
0.3831 (—26.82%). The numbers in parentheses give errors relative to the Monte 
Carlo solution in Example 8.19. O 

The calculation of moments of U based on its representation in (8.57) involves 
expectations of powers of A ~ 1 R that can be calculated efficiently by Monte Carlo 
simulation. Note also that expansions U (p) of different orders need to be used to 
calculate moments of U with the same accuracy. For example, the error of the mean 
of t/ (1) is of order e 2 . To obtain an approximation of the same order for the standard 
deviation of a coordinate of U, we need to use the perturbation solution U (2> . The 
perturbation method can be applied under some conditions to nonlinear equations of 
the type in (8.54) but its use is rather convoluted. Note also that the assumption that 
B is deterministic is not restrictive since the approximation U tp) of U in (8.57) holds 
for B random. 
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8.3.3 Neumann Series 

The Taylor series and the perturbation methods calculate moments of the solution U 
of (8.1) approximately without finding A -1 . The Neumann series method constructs 
an approximation for A -1 , and uses this approximation to calculate properties of U. 
The construction of an approximation for A -1 is based on the following result. 

Theorem 8.19 If C is an (d , d)-deterministic matrix such that \\Cx\\ < y||x||, 
0 < y < l, for all x e then the series 

oo 

(7 + Cr 1 = ^(-l) r C r (8.58) 

r=0 


is absolutely convergent, where I denotes the (d, d) identity matrix, C° = I, ||x|| is 
the Euclidean norm of x e ([27], Chap. 2). 

Proof The sequence S k x = ]T^ =0 (— 1 ) r C r x is Cauchy in since, for k > l, 

k k y /+1 - v k+[ 

\\S k x - Six\\ < ^ \\C r x\\ < ^ y r \\x\\ = — ||x||-»0 

r=l+l r=l + 1 ^ 


as k, l — »■ oo. Since R" is complete ([6], Theorem 3.8), {^x} has a unique limit 

k OO 

Sx = lim S k x = lim V (-l)'C'x = V (-l) r C r x. 
k^-oo k-+o o 


It remains to show S = (I + C) 1 , that is, S(I + C)x = (/ + C)Sx = x for each 
x. We have 

\\Sk(I + C)x — x|| = || S k x + S k Cx-x ||= || ( — l)*C* +1 x|| < / +1 ||x|| 0, 


as k -* oo, so that lim^-j-oo S k (I + C)x = x and S(I + C)x = x. Similar arguments 
give (7 + C)Sx = x. That ^“ 0 (— l) f C r x is absolutely convergent follows from 


^(-!) r C r x 


ZI 

r=0 


C r x || < 


iy 

r = 0 


since ||x||/(l — y) is finite. ▲ 

Example 8.21 Let U e R. be the solution of (8.1) with d = 1, A = (A + R ), 
A = £[A], R a random variable, and B a constant. If \R/A\ < y < 1 a.s., the 
sequence ^ r=0 (— 1 ) r (R/A) r (B/A) converges a.s. to U as k -* oo, so that the 
solution of (A + R)U = B is U = Z“o(~V ( R /A) r ( B/A ). <> 
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Proof An alternative form of (A + R)U — B is (1 + A 1 R ) U = A 1 B so that 
U = (1 + A _1 /?) _1 (A _1 B). Let S k (aj) = 1 + Zf=t(-1 Y A~ r R{(o) r be a sample 
of S k . Since \ R/A\ < y < 1 a.s., the numerical sequence 

k k 1 - v k+l 1 

\Sk(a>)\ < 1+V \A~ r R(w) r \ < 1 + Y/= V- < <oo 

7^i 7^i l ~ y { ~ Y 

has a limit as k -» oo for almost all a> e Q . A 

Theorem 8.20 If the entries of A in (8.1) have finite variance and || A ~ 1 Rx II <Y\\x\\ 
a.s. for all x 6 then the Neumann series representation of the solution U of this 
equation is 

U = Y J U (r \ where U (r) = -A~ x RU {r ~ x \r = 1,2, . . . , with U (0) = A~ l B, 

r = 0 

(8.59) 

where A = E[A\ and R — A — A, so that the mean and covariance matrices of 
U ~U = U (0) + U w arep. u = A^Bandyu = E[(A~ l RA~ l B)(A~ l RA~ l Bf]. 

Proof The solution U of (8.1) can be given in the form U = (A + R)~ l B = 
(/ + A- l R)- l A- l B, so that U = ( A ~ l R Y B b y ( 8 - 58 )- The 

approximate moments of U in (8.59) follow from the representation U — [/ <0) + f/ (1 ’ 
of U by straightforward calculations. ▲ 

As for the perturbation and Taylor series methods, the assumption that B is deter- 
ministic is not restrictive. Moments and other properties of U can be calculated from 
truncations 1)'’(A _1 /?) , A~ 1 .B of the Neumann series representa- 

tion U = TlYLY— I ) r (A~ 1 R)‘ A -1 B. Since the functional form of U <n ' 1 is available, 
its properties can be estimated efficiently by Monte Carlo simulation. 

8.3.4 Equivalent Linearization 

Let U = hi X) be a trial solution for (8.1) with A depending on X and B assumed 
to be deterministic, where h : R' /r — > W l is a measurable function. Our objective 
is to find that h minimizes the error || U — U\\ = E[(U — U)'(U — f/)*] 1 / 2 . This 
optimization problem cannot be solved since U is unknown. We consider an alter- 
native optimization problem that selects a function h that minimizes the discrepancy 
|| Ah{X) — B | between the left side of (8.1) with U in place of U and the right side 
of this equation. 

Definition 8.4 The equivalent linear solution of (8.1) has the form U = aX + p, 
where a and f are id, cl x ) and id , 1) matrices with entries selected to minimize the 
mean square error \\A(aX + ft) — Z?||. 

Example 8.22 Let d — 1, B > 0 a constant, and A — X a lognormal variable with 
mean /r, variance a 2 and coefficient of variation v = a / pt. The equivalent linear 
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u u 


Fig. 8.12 Densities of U defined by A U = 1 for A ~ N(p, a 2 ) with (p. = 0, a = 1) ( left panel) 
and (p, = 1, a = 1) (right panel) 

solution is U = aX + ft with a = 1 + v 2 ) -4 and ft = B/x~ l ( 2 + v 2 )(l + 

v 2 ) -2 obtained by minimizing the objective function e = E[(B — a X 2 — f>X) 2 ], 
that is, imposing the conditions de/d a = 0 and de/d(i = 0. The approximate 
mean and variance of U are E[U] = B/x _1 ( 1 + v 2 ) _4 (l + 5v 2 + 4v 4 + v 6 ) and 
Var[t/] = B 2 ix~ 2 ( 1 + v 2 )“ 8 v 2 . <> 


The linearization method may provide limited if any information on the probability 
law of U since U depends linearly on X while U is a nonlinear function of this random 
vector. 

Example 8.23 Consider the scalar equation AU = 1 with A ~ /V ( /i , a 2 ). 
Figure8.12 shows densities of the solution U of AU = 1 for two values of (ji. a). 
These densities differ significantly from that of the solution U = a A + obtained 
by the equivalent linearization method, which is a Gaussian variable. O 


Proof The distribution of U = 1/A is 


P(U < u ) = P(A > 1/m) = 


P(U < u) = P(A e (1 /m, 0)) = - 


(^). .>« 
1 


u < 0, 


so that its density is fu(u) = (cnr) 1 [</> ((/x — 1 /m)/<t)1(m > 0) + </>((l /u — 
/x)/ct)1(m < 0)], u e 1. ▲ 


8.4 Exercises 

Exercise 8.1. Show that the estimator f ), in (8.9) is unbiased and its variance vanishes 
as n — y oo. 

Exercise 8.2. Repeat the calculations in Example 8.5 for a different choice of {x® }, 
k = 1, . . . , m, in [a, b]. 
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Exercise 8.3. Complete the proof of Theorem 8.4. 

Exercise 8.4. Use the substitution principle in (8.15) to smooth the SROM-based 
distributions of the eigenvalues of matrix A in (8.16). 

Exercise 8.5. Let U (n \ n = 1,2, , and U be R' 2 -valued random variables in 
L 2 (£2, & , P). Show that if U ^ converges weakly to U, that is, (U ( ' l> , W) — > 
(U, W) as n — »• oo, VVT e L 2 (£2, A? , P), then its weak limit U is unique. 

Exercise 8.6. Show that in a finite dimensional Hilbert space H weak and strong 
convergence are equivalent concepts. 

Hint Let (e <l K . . . , e < " ,) ) be an orthonormal basis in H and let x„, n = 1,2, ... , be 
a sequence in H converging weakly to x e H. Since x n — x = 2a=i ((*«> e (i> ) — 
(x,e®))e®, we have \\x n — x\\ 2 = X™=i \ {xn,e^) — {x, e®) | 2 . 

Exercise 8.7. Develop an optimization algorithm that delivers a SROM for a bivariate 
translation vector with coordinates X\ ~ f/(0, 1) and A 2 ~ EXPiX), k > 0, so 
that Ai = 0(Gi), A 2 = — In (l — ( t>(G 2 )) /k, and (Gi , G 2 ) is a standard bivariate 
Gaussian vector with correlation coefficient p = P[G 1 GR]. 

Exercise 8.8. Find polynomial chaos expansions of increasing order n for X = 
\N(0, 1) | . Calculate and plot fourth order moments for A as a function of n. Comment 
on your findings. 

Exercise 8.9. Repeat calculations in Example 8.17 by using Bernstein polynomials 
defined by (8.42), rather than Lagrange polynomials. 

Exercise 8.10. Develop an optimization algorithm for calculating the probability 
P(g( A) < 0) by the approximation in (8.46), where A is a two-dimensional vec- 
tor with independent N(0, 1) coordinates and g(x) = {x\/a\) 2 + {xj/ai) 2 — 1 
with (ai, 02 ) = (2, 1) and (ai, 02 ) = (4, 3). Evaluate the accuracy of the resulting 
approximations by Monte Carlo simulation. 

Exercise 8.11. Extend the approximate solution in (8.51) to the case in which A and 
B depend on A. 

Exercise 8.12. Derive approximate expressions for the first two moments of U 
defined by (8.54). Assume the coordinates of A are A; = Uf, i = 1, . . . , d. 

Exercise 8.13. Find the approximate second moments properties of U defined by 
AU = B with A and B in (8.52) by using the representation in (8.53). 

Exercise 8.14. Solve the SAE in Example 8.19 by the perturbation and Neumann 
series methods. 

Exercise 8.15. Complete the calculations in Example 8.22. 
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Chapter 9 

Stochastic Partial Differential Equations 


9.1 Introduction 

Deterministic partial differential equations (PDEs) have been used extensively in 
science and engineering to predict the behavior and assess the performance of a 
broad range of physical systems. These equations are likely to describe satisfactorily 
the behavior of a system at the macroscopic scale, but may provide insufficient, or 
even incorrect, information on the system state at the microscopic and mesoscopic 
scales [1], At small scale, the coefficients of PDEs describing a system state fluctuate 
randomly in space. Probabilistic models, for example, random fields, need to be 
employed to characterize these coefficients [2-6]. We refer to the class of PDEs 
with random coefficients, source term, and/or end conditions as stochastic partial 
differential equations (SPDEs). 

There are notable differences between the types of SPDEs in the mathematical 
and applied literature and between the methods used to solve these equations. In the 
applied literature, the random elements in the definition of SPDEs are usually colored 
random functions of space and time that may or may not have smooth samples. 
Weak forms of these equations are used to establish conditions for the existence 
and uniqueness of their solutions and construct numerical algorithms. The random 
elements in the definition of SPDEs in the mathematical literature are primarily white 
noise random functions of space and time. A generalized version of Ito’s formula for 
semimartingales is the essential tool for analysis. 

Our objectives are to establish conditions for the existence and uniqueness of the 
solutions of SPDEs and examine methods for solving these equations. It is assumed 
that both the functional form of the SPDEs and the probability law of the random 
elements in their definitions are known. The following two sections focus on SPDEs 
commonly considered in the mathematical literature. It is shown in Sect. 9.3 that 
approximate versions of these equations obtained by space and time discretization 
can be solved by the methods in Chaps. 7 and 8. Sections 9.4 and 9.5 deal with SPDEs 
encountered in applications with random entries that have arbitrary and small uncer- 
tainty, respectively. The focus is on stochastic elliptic partial differential equations. 


M. Grigoriu, Stochastic Systems, Springer Series in Reliability Engineering, 
DOl: 10. 1007/978- 1-447 1-2327-9_9. © Springer- Verlag London Limited 2012 


379 


380 


9 Stochastic Partial Differential Equations 


It is argued in Sect. 9.4 that finite element and other numerical methods for deter- 
ministic PDEs [7, 8, 9, 10] cannot be extended directly to solve SPDEs. We examine 
the efficiency and accuracy of Monte Carlo, stochastic reduced order models, sto- 
chastic Galerkin, and stochastic collocation methods for solving SPDEs. Section 9.5 
reviews briefly the Taylor, perturbation, and Neumann series methods for solving 
SPDEs with coefficients of small uncertainty. 


9.2 Stochastic Partial Differential Equations 

Partial differential equations with random coefficients, source term, and/or initial 
and boundary conditions depending on space and time are referred to as stochastic 
partial differential equations (SPDEs). Generally, the random coefficients of the 
equations considered in the mathematical literature are white noise processes that 
are more general than those considered so far in the book since they depend on 
both time and space. The solution of these equations requires extensions of our 
previous developments related to stochastic integrals, differentiation formulas, and 
other advanced concepts on probability theory and random functions. 

For example, let (£2, & , P) be a probability space endowed with a filtration 
J ? t , t > 0. A real-valued function <P(x, t), x e M. d , t > 0, is said to be a spatially 
dependent martingale if it is an -martingale for each x e W 1 . \f Z(x, t) is an ■'Z R - 
adapted, real-valued process on (£2 , JF, (J^r)?>o, P) such that fj |I7(x, t)\~ dt < oo 
a.s. for each x e and t B C then the Ito integrals I (x) = S (x, t) dB(t) 
are well defined for each x e R rf , where .'Z t R = a(B(s), 0 < s < t) denotes 
the filtration generated by a Brownian motion B defined on (£2, & , P ) (Sect. 4.4). 
However, l(x), x e R^, may not be a random field since the union of the null sets 
N(x) on which Kx) is not defined may not be a null set or even a measurable set. 
Also, the Ito formulas established in Sects. 5.2 and 5.3 need to be generalized to be 
applied to functions of spatially dependent martingales and semimartingales ([11], 
Theorem 2.3, p. 20). The reader interested in mathematical developments on SPDEs 
is referred to [11, 12]. Our discussion in this section is based on [11]. 

Let Z(x), x e D, be a real-valued random field with mean 0, finite variance, and 
correlation function r(x, y ) = E[Z(x) Z(y )] assumed to be continuous, where D is 
a bounded subset of Since Z(x) has finite variance, J d2 r(x, y) 2 dx dy < oo and 


OO 

Z(x) — y yfjlk z k (Pk(x), X e D, (9.1) 

k= 1 

holds in the mean square sense, where {/x,t > 0} and {^.} denote the eigenvalues 
and the eigenfunctions of the integral equation f D r(x, y) (p(y) dy — ijl q>(x), x e 
D. and { Z^ ] are uncorrelated random variables with mean 0 and variance 1. If 
Z(x) is a Gaussian field, then { Z/- } are independent M0, 1 ) variables (Sect. 3.6.5). 
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The correlation functions r n (x, y) = E [Z^‘\x) Z (n, (y)] = Bk <Pk(x) <Pk(y) 

of the random fields Z (,1 \x) = J~Pk (fik(x), n = 1,2,..., obtained 

by truncating the representation of Z(x) after n terms converge to r(x, >>) = 
E[Z(x ) Z(y)] asn -* oo (Theorem 3.22). It can be assumed that the eigenfunctions 
in (9.1) form an orthonormal sequence, that is, f D (pk(x) <pi{x) dx — Ski. 

Definition 9.1 Let {Bk} be independent Brownian motions defined on a probability 
space (.12, & , P), and set 


W {n \x, t) = ' yfjlk (pk(x) Bk(t), x e D, f e [0, t], (9.2) 

k=i 

with (/jik,(Pk) as in (9.1). We refer to {W ( '^} as spatially dependent stochastic 
processes. 

The processes W (n \x, t) in (9.2) have zero means and correlation functions 

n 

£[w /( " ) (x, t ) W ( " ) (y,'S)] = ^ Bk n(x)<Pk(y) (i A t) = r„(x, y) (s A t), n= 1, 2, ... , 
k=l 

(9.3) 

so that r n (x. y) r(x, y) by properties of Karhunen-Loeve representation. Also, 
for n > m, || W (n) - W (m) \\ 2 l2(D) = XLm+i Bk B k (t) 2 f D <p k (x) 2 dx = Ylk=m + 1 
Bk B k (t) 2 by properties of {(p k (x)}, so that E[\\W (n) - W (m) \\ 2 L2(D ^\ = t ZLm+i 
Bk < r Y,k= m + 1 Bk- Under the assumption Bk < oo, wehave££ =m+1 
0 as m, n -a oo. Since ||VT ( ' !) (-, t) — VT ( "9(-, Oll^^ is a real-valued continuous 
submartingale, there exists a constant c > 0 such that ([11], Theorem 3.3, p. 11) 

£[ SUp \\W ( - n \;t)-W (m \;t)\\ 2 LHD) ] 

n 

<c£[||W (n) (-,r)- W (m >(.,r)||2 2(D J <cr £ Bk, 

k=m + 1 


which implies Zs[sup 0<r<r || W ^ n \-, t) — Oll^^p] 0 as m,n -a oo. 

Hence, { lT (H, (x, ■)} are Brownian processes for each x e D and 

OO 

W(x,t)= lim W (n \x, t) = ^ JJIk q>k{x) Bk{t) , x e D, ( 6 [0, r], (9.4) 

n — >oo ' 

Jr=l 

exists in the mean square sense. 

Definition 9.2 The formal time derivative W(x, t ) = dW(x, t)/dt of W(x, t) is 
called spatially dependent Gaussian white noise. We also refer to the temporal deriv- 
atives W ( '9(.x, t ) = 3 VT ( '9(x, t)/dt as spatially dependent white noise processes. 
Formal calculations show that the white noise processes W (x , t) and W iin (x, t) 
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have mean 0. Their covariance functions are E[W(x, t) W(y, .v)] = r(x, y) S(s — t) 
and E[W^(x, t) W^ n \y, s)] = r„(x, y) — t), respectively. 

The following example adapted from [11] (Sect. 3.3) illustrates technical diffi- 
culties involved in the solution of SPDEs of the type considered in the mathemat- 
ical literature, and the need to introduce simplifying assumptions, that may be too 
restrictive for applications. The problem in this example is also solved in Example 
9.2 by using finite differences and methods in Sect. 7.2 to approximate spatial deriv- 
atives and solve the resulting ordinary differential equations driven by random noise. 
The approach in Example 9.2 is conceptually simple, computationally efficient, 
and accurate. 

Example 9.1 Let U(x, t ) be a real-valued random function satisfying the stochastic 
partial differential equation 


dU (x, t) 
dt 



U(x, t) + W(x, f), 


x e (0,/), 


t e (0, r) 


(9.5) 


with boundary and initial conditions 0(0, t) = U(l, t) = 0 and U(x, 0) = h(x), where 
a, ft, l > 0 are constants, h(x ) is a smooth deterministic function and W (x, t) denotes 
a spatially dependent Gaussian white noise. The solution of (9.5) can be given in the 
form 


OO 

U (x, t ) = ^ Uk(t ) ejt(x), x e (0, Z), t e (0, r), (9.6) 

k=\ 


where Ot(0 are Ornstein-Uhlenbeck processes defined by 

dU k (t) = -X k U k (t)dt + <JjZj;dB k (t), te( 0, r), (9.7) 

\B k (t)} are independent Brownian motions, the initial state for (9.7) is U k ( 0) = 
(h, e k )/\\e k \\ 2 = 2(1 —cos(p k l))/p k , || ■ || is the norm in L 2 ( 0, Z), and X k and e k {x) 
denote the eigenvalues and eigenfunction of the operator stf = ji 3 2 /3x 2 — a , so 
that they satisfy —f$ e" (x) + a e(x) = X e(x) or, equivalently, e" (x) + p 2 e(x) = 0, 
with the boundary conditions e(0) = e(l ) = 0, where p 2 = ( X — a)/ p. We have 
p k = kji / 1 and e k (x ) = sin(k^ x/l), k = 1, 2, . . . , so that X k = a + ft p| . Since 

U k (.t) = U k (0)e- k *‘ + ^I k (t), where4(f)= [ e~ Xk{t ~ s) dB k (s), (9.8) 

Jo 

we have 


OO OO 

U(x,t) = y. U k (0) e~ Xkt e k (x) + ^ hit) e k (x). 


(9.9) 
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Fig.9.1 Expectations E[U^ n \x,t)] as functions of x e (0,/) at several times (left panel) and 
covariance matrix of U^'Hx, r) (right panel) 



Denote by 


U (n \x,t) = u k (0) e-'^ e k (x) + Y,Jhk W) e k (x) (9.10) 

k = 1 k=\ 

the series representation of U(x, t) truncated after the first n terms. 

The mean and correlation functions of U(x, t) are (Sect. 7.2.2) 

E[U(x, f)] = V 2 1 ~ C ° S(PA ' e~ Xk ' e k (x) and 
pk 

E[U(x,t)U(y,s)] = (e- Xkls -‘ l - e Xk (s+t) ) e k (x) e k (y). (9.11) 

k=l Z k ' ' 

Numerical results have been obtained for a = fi = \, l = \, t = \, h(x) = 1 , 
and U p, \x, t ) = U k (t)e k (x) with n = 30. The left panel in Fig.9.1 shows 

expectations of U (jl> (x, t ) as functions of x e (0, l ) at several times. The expectation 
of U(x, t) approaches 0 as time increases. The covariance matrix of t/ ( '^ (x, r) shown 
in the right panel of the figure is zero on the boundary of (0, l) 2 in agreement with 
the specified boundary values for U(x,t). 

The deterministic part = XiUi U k (0) e~ Xk> e k {x) of U (,p in (9.10) con- 

verges to that of U by properties of deterministic partial differential equations. The 
random part U (r ' n Hx, t ) = X*=i hit) e k (x) of U (jl) converges to that of U in 

mean square provided l l k < Moreover, U(x, t ) is mean square continuous 

and satisfies (9.5) in the weak sense, that is, 

(C/ (-, t), <p) — (h, (f>) + I (£/(•, s) £/[(/)]) ds + (W(-, r), cf>) a.s., 

Jo 

where (f> e Cq°(0, 1) are trial functions. O 


(9.12) 
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Proof The solution of the homogeneous version of (9.5) by separation of variables 
has the form U(x, t) — X(x)T ( t ) so that X (x) T(t) = f 1 X"(x) T(t)—aX (x) T ( t ) 
implying f(t)/T(t) = (/I X” (x)—a X(xf) / X(x) = —X, where! > 0 is a constant. 
The general homogeneous solution is U(x,t) = Ck e~ Xk 1 e k (x), where c k 

are constants and X k and e k (x ) denote the eigenvalues and eigenfunctions of the 
differential equation f> e" (x) — a e(x) = —X e(x) or e" (x) + p 2 e(x) = 0 satisfying 
the boundary conditions e(0) = e(l) = 0. The notation p 2 = (X—u)/fi is meaningful 
since 0 < a < ki < X 2 < ■ • • and ft > 0. 

If the white noise admits the representation W(x, t) = ~Jpfe k (x) B k (t), 

that is, <pk(x) in (9.2) are the eigenfunctions e k (x) of &/ — f> 3 2 /3x 2 — a and B k (t) 
denotes the formal derivative of B k (t), then (9.5) becomes 

X! [UkiO + h U k (t) - J~pfB k (t)] e k (x) = 0, 

k> 1 

which gives (9.7) since this equality must hold for every x e (0,1). 

The mean square convergence of the random part U^ r ’ n) of U in) to the random 
part U (r> of U holds under the assumption Xfci Pk < 00 since 



so that sup 0 < f < T E[\\U^ n H-, t)f] < 00 and sup 0 < r5r E[\\U^f, t)\\ 2 ] 

< (//(4 a)) XZ n+ 1 p k > 0 as n — > 00 since Pk < °°- Similar arguments 
can be used to show that U ir> is mean square continuous, that is, f_’[| U (r) (■. s ) — 
U ir) (-,t)\\ 2 ] -* 0as|s -t\ -* 0 (Exercise 9.1). 

That U (n) (x, t) in (9.6) is a weak solution for (9.5), that is, U(x,t) satisfies (9.12), 
follows from the formula [/^"/(x, t) = h(x) + Jq ^[U^ n \x, i)] ds + W(x, t) result- 
ing from (9.5) by time integration, the representation of U (n fx, t) in (9.10), and the 
equalities 


n t 

{U {n \-,t),<p) = (h {n) ,<P) -Ylt / U k (s)ds(e k ,f) + (W (n) (-,t ),</>) 
k=i Jo 

n , 

= (h (n \f) + T / U k (s)ds (£f[e k ],<t>) + (W in) (-,t),<t>) 
k= l J ° 

n 

= (h (n \<P) + Y / U k (s)ds (e k ,^[cp]) + (W (n) (-,t),(P) 
k= i Jo 

= (h (n \<P)+ [ (U {n) (-,s),£/[cp])ds + (W (n \-,t),<p), 

Jo 
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where h {jl) — (^> e k) £k, 0 is a test function with the properties <p e C°°[0, Z], 

and </>( 0) = (j)(l) = 0. In the above equalities, we use the fact that &/[e k ] = 
ft e'! — a e k is an adjoint operator (Sect. B.4.4). 

The weak solution is unique since, if U^(x,t) = C/* (?) e k (x) and 

f/ ( "i(x, t ) = XtUi Ui kit) e k( x ) are t wo weak solutions, then 

n n 

V(x,t) = U (n) (x,t) - U (n) (x,t ) = J^{U k (t) - Ut(t )) e k (x) = ^ V k (t) e k (x), 

k=l k= l 

so that Vj(f) = —k k fg V k (s) ds for <p — e k and 

I V* (f ) | 2 = (y V k (s)ds^j <x\t J V k (s) 2 ds 

by the Cauchy-Schwarz inequality. This bound on | V k ( t ) [ : and Gronwall’s inequality 
imply Vi(f) = 0, k = 1, . . . , n, a.s. for all t e [0, r] (Exercise 9.2). ▲ 

The approach in Example 9. 1 for solving stochastic partial differential equations 
is not common in the applied literature. The following two-step approach is usually 
preferred in applications. First, partial differential equations are approximated by 
ordinary differential equations by space discretization or difference equations by 
space-time discretization. Second, methods of random vibration (Sect. 7.2) are used 
to solve the approximate equations constructed in the previous step. 

Example 9.2 Consider the stochastic partial differential equation in Example 9.1. 
Let 0 = .\'o < xi < • • • < Xi < ■ • ■ < x n < x„_|_i = l be equally spaced points in 
(0, Z), Ax = x, — Xi- 1 , i = 1, . . . , n + 1, and At > 0. The finite difference version 
of (9.5) is 

Yj ( t + At) — a Yi-i(t) + b Yj (f) + a Yi + \ ( t ) + Wj ( t ) At, for i = 2, . . . , n — 1 , 
Y\ ( t + At) = b Y\ (t) + a Y 2 (t) + Wj (t ) At, for Z = 1, 

Y n (t + At) — a Y n -i(t) + b Y n (t) + W n (t) At, for i=n, (9.13) 

where lj(f) = U{xj,t), W/(t) — W(xt,t), a — P At /{Ax) 2 , b = 1 — 2 ft At / 
(Ax) 2 — a At, and t > 0. The finite difference equations for i = 1 and i = n account 
for the boundary values of U(x, t). The matrix form of (9.13) is 

Y(t + At) = AY(t) + LN, t> 0, (9.14) 

where Y (t) = (Ti(f), .... Y n (t))' , A is an (n,n)-matrix with zero entries except for 
A(i,i) = b and A(i, i — 1) = A(i,i + 1) = a, L denotes a lower triangular matrix 
such that L L' = R = {r(x,, Xj)}, i, j = 1, . . . , n, and N is an R" -valued random 
variable with independent N( 0, At) coordinates. 

The left and right panels in Fig. 9.2 are analogues to those in Fig. 9.1. They give 
the expectations of Y(t) at various times and the covariance of Y (r ) for the same 
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Fig. 9.2 Expectations of Y(t) at several times ( left panel) and covariance matrix of Y (z) ( right 
panel) 

values of a, f, 1, and h(x) as in Example 9.1 ,n — 20 , and At = 0.001 . The solution 
of (9.5) by finite differences is conceptually simple, provides results similar to those 
in Example 9.1, and uses familiar concepts. Moreover, the finite difference method 
is not restricted to noise processes W(x, t) with the spatial and temporal correlations 
in Example 9.1. <> 

Proof The finite difference formulas in Theorem 7.1 applied to (9.14) give /r(f + 
At) = A fi(t) with pi(0) = (h(x\), . . . , h(x n ))' and y (t + At) = A y(t) A' + R At 
with /(0) = 0, where /x(t) = E[Y (/)] and y(t) — E[{Y (t) — /.lit)) ( Y(t ) — 

The time and space steps. At and Ax, have to be selected such that the eigenvalues 
of A are included in the unit disc centered at the origin of the complex plane. ▲ 


9.3 Discrete Approximations of SPDEs 

It was shown in Example 9.2 that finite difference versions of SPDEs can be used 
to solve these equations approximately. Consider a SPDE 2z?[£/(x, t)\ = V (x , t ) 
defined on an open bounded subset D of in a time interval (0, r), and let 
= Vf be a finite difference version of this equation defined on a space- 
time lattice or mesh in D x (0, r), where {Uj!} are finite difference approximations 
for {U ( xk , t , ,)}, Vu — V(xk, t„), xk is the coordinate of node k, t n = n At, and 
At > 0 denotes the time step. Our objective is to establish conditions under which 
{U'j!} provides an approximation for U(x, t), (x, t) e I) x (0, r). It is assumed that 
A/\U (x, t)] = V (x, t) is well-posed, that is, the magnitude of changes in U(x, t) 
caused by changes in data can be bounded by the magnitude of changes in data scaled 
by a constant. 

Let U(x, t ) be the potential in a random heterogeneous material specimen occupy- 
ing an open bounded subset D of . Suppose the material diffusivity/conductivity in 
this specimen can be modeled by a real-valued random field E (x ) , x e D, defined 
on a probability space {£2 , & , P). Then U(x, t ) is the solution of 
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3 U(x, t) 
3 1 


d 


z 


9E(x) 3 U(x,t) 
3 x p 3 x p 


+ E(x) AU(x,t) + V(x,t), 


x e D, 


t > 0, 

(9.15) 


accompanied by initial and boundary conditions, where A — 9 2 /9 x p denotes 

the Laplace operator and V(x,t) is the source term at location x € D and time 
t > 0. An alternative form of (9.15) is J£\U (x, t)\ = V (x , t) with «Sf[C/ (A, t)} = 
dU (x, t)/dt — V ■ (Six) Vt/(jc,0). 

It is assumed that (1) almost all the samples of the conductivity field L have 
continuous first order derivatives and take values in a bounded interval [a,h], 0 < 
a < b < oo, (2) the source term V is a deterministic function that is continuous 
in its arguments, and (3) the initial and the boundary conditions are deterministic 
and provide adequate information on U in D at the initial time 1 = 0 and on the 
boundary 3 D of D at all times t > 0 such that (9.15) has a unique solution for almost 
all samples of E (x). Since E (x) is a random field, (9.15) is a stochastic partial 
differential equation and its solution is a random function of time and space defined 
on the same probability space as the conductivity random field. 

We use finite differences to construct an ordinary differential equation with state 
Y(t), t > 0, approximating the solution U(x, t) of (9.15). The case d = I is considered 
in details. The extension to the case d > 1 is also discussed. Let U(x,t) be the solution 
of (9.15) with d = 1 and I) = (0, /),()</< oo. For simplicity, V(x, t) is assumed to 
be a specified deterministic function, and U(x,t) is required to satisfy the deterministic 
initial and boundary conditions U(x, 0) = f(x) for x e D, and (/((), t) — gi(t) 
and {/(/, t) = g 2 (t) for t > 0. 

Let h — Z/(v + 1) and At > 0 denote space and time steps, where v > 0 is an 
integer. The finite difference version of (9.15) is 


U*(x, t + At) =— A\(x) U*(x + h, t) + ( 1 , A 2 CO ) U*(x, t) 


h 2 




At 


H A?,{x) U (x — h,t) + V (x, t) At, 

h z 


(9.16) 


where the coefficients A\(x) = (E(x + h) — E{x — h) + 4i7(.v:))/4, A 2 &) = 
2 E(x), and A 3 (x) = (— E(x + h) + E(x — h) +4 E (x)) /4 depend on the conduc- 
tivity random field and U* is defined at the nodes of the finite difference mesh with 
spatial and temporal coordinates xu = kh and t n = n At, that is, U*(xk, t„) = Uj! 
and = V'd with the previous notations. The finite difference operator^" in 

(9.16) has been obtained from in (9. 15) by approximating the spatial and temporal 
derivatives of U(x,t) by central and forward finite differences, respectively. 

Theorem 9.1 Let e(x, t) = U (x, t) — U * (x , t) be the discrepancy between the 
solutions U and U* , and let § ( t ) = maxo<^</ \e(x,t)\. IfU is six times differentiable 
in x and three times differentiable in t, A\{x), A}).*) > 0, x e [0, l], for almost all 
samples of E(x), and E{x) takes values in a bounded interval [ a,b ], 0 < a < b < 
00 , almost surely, then 
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dS{t) 

— < c\(h), with SlO) = 0 and > 0, (9.17) 

dt 

so that S(t) < c\(h)t. Since lim/^o ci (h) = 0, the error Sit) can be made as 
small as desired by a proper selection ofh. 

Proof Our arguments extend slightly upon considerations in [13] (Sect. 52) on deter- 
ministic parabolic differential equations. Since U(x, t ) does not satisfy (9. 16), we have 


At 


U(x, t + At) —-py A\{x) U(x + h, t) + j 1 
At 




A 2 (x) ) U lx, t) 


H y A3(x) U (x — h,t) + V{x, t) At + T (x, t), (9.18) 

h A 


where T(x, t) is a non-zero function. The error elx, t) = U(x, t) — U*(x, t) is the 
solution of the finite difference equation 


At 


(>-f A 2 «,)) 


e(x, t + At) =--■ A\ (x) e(x + h, t) + ( 1 — py 
At 

+ tt ^ 3( x ) e(x — h, t) + T(x, t), 
h- 


e(x, t) 


(9.19) 


with zero initial and boundary conditions since U and U* satisfy the same initial 
and boundary conditions. Since almost all samples of £ take values in a bounded 
interval [a, b], 0 < a < b < oo, the functions A i, A 2 , A3 are bounded a.s. and 

At , At , 

|e(x, t + At ) | < —y |Ai(x)| \e(x + h,t)\ + \l - —y A 2 (x)| \e(x, t)| 

+ ^\A 3 (x)\\e(x-h,t)\ + \T(x,t)\ 

< \^y |Aj(x)| + |1 - 0- A 2 (x)| + ~ |A 3 (x)|J sit) + I Tlx, 01, 

(9.20) 

holds for almost all samples of £ (x). We assume that the conditions Ai(x), A 3 (x ) > 
0 and 1 — l At / h 2 ) A 2 (x) > 0 hold a.s., so that the square bracket on the right side 
of (9.20) is equal to 1 and 


|e(x, t + At)\ < Sf) + max |T(x, t)\ < ${t) + M, (9.21) 
0<x</,r>0 

where Mis an upper bound on |T(x,f)| for (x, t) e (0, Z) x (0, r). The inequality 
(9.21) implies Sit + At) < Sit) + M, so that Sin At) < n M, n = 0, 1, . . . since 
<§■(0) = 0. The assumptions Ai(x), A3(x) > 0, and 1 — lAt/ h 2 ) A 2 (x) > 0 hold 
a.s. for 5o — b > 0 and proper selection of h and At since A 2 (x) is bounded. The 
condition 5 a — b > 0 is very strong, that is, A 1 (x ) , A3(x) may be positive although 
5a — b > 0 is not satisfied, for example, the case in which £ (x ) has continuous 
samples and h is sufficiently small. 
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We now construct an upper bound M on | T (x , t ) | . To simplify our discussion, the 
error related to the approximation S\x) — (S(x + h) — S(x — h))/(2h) is not 
considered. Under the assumptions that the solution U(x, 1 ) is six and three times 
differentiable in jc and t, respectively, the following Taylor expansions of U(x, t ) hold 


dU(x, t) _ U(x,t + At) — U(x, t) At d 2 U (x, t) At 2 d 3 U(x,r) 

dt ~ At 2 dt 2 6 dt 3 

dU(x,t) _ U(x+h, t)-U(x-h, t) h 2 d 3 U(x,t) h 4 d 5 U(%,t) 
dx 2 h 6 dx 3 120 dx 3 

d 2 U(x, t) U(x + h,t)-2U(x,t) + U(x-h,t) h 2 d 4 U(x,t ) h 4 d 6 U^,t) 


dx 2 


h 2 


12 dx 4 360 dx<> 

(9.22) 

where r e (t, t + At) and f e (x, x + h), so that T(x,t) in (9.18) has the expression 


1 Z(x + h)~ F(x-h) 1 

— T (x, t) — 

At 2 h 2 h 


2 h 3 d 3 U(x,t) 2 h 5 d 5 U(^,t) 


dx 3 


120 dx 5 


Six) 
h 2 


h 4 d 4 U(x,t) h 6 d°U (^, t) 


12 dx 4 


360 dx 6 


At d 2 U(x,t) At 2 9 3 f/(x,r) 


dt 2 


dt 3 


which implies 


I T(x, t ) | < At [ci(/i) + C 2 {At)\, 


(9.23) 


(9.24) 


where 


c\{h) = ( b 



C2(At) = At I — - 


(9.25) 


\d c ‘U{x,t)/dx‘l\ < M q , q = 2,..., 6, \d r U(x, t)/dt r \ < N r , r = 2,3, and 
M q , N r are positive constants. Since the right side of (9.24) does not depend on (x, t), 
the constant Min (9.21) can be set equal to At [ci(/j) + C 2 (At)\, so that ${t + At) < 
Sit) + At [ci (h) + C 2 iAt)\ or [Sit + At) — S{t))/ At < c\{h) + C 2 iAt), which 
gives (9.17) since C 2 iAt) — ► 0 as At — > 0. Accordingly, we have Sit) < c\ih) t, 
that is, the error between the exact and finite difference solutions increases linearly 
in time for a given h. Since c\ih) —> 0 as /; — >■ 0 by (9.25), Sit) can be made as 
small as desired in any bounded time interval by an appropriate selection of mesh 
size h. 

Similar arguments can be used to develop bounds on the error Sit) for transport 
equations defined on a bounded domain D C R^, d > 2. As for the one-dimensional 
case, the resulting bound on Sit) satisfies an inequality similar to that in (9.17). A 
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The finite difference scheme in (9.16) yields the recurrence formula 

Y(t + At) = (I + A At) Y(t) + W(t)At, t = n At, (9.26) 

where Y(f) is a vector- valued process with coordinates Yk(t) = U*(x/ C , t), A denotes a 
random matrix resulting from (9. 16), / is the identity matrix, and W(t) is a vector with 
coordinates {V (x&, t)\. The limit of (9.26) as At — > 0 gives the ordinary differential 
equation 


Y(t) = A Y (t) + W(t), t > 0, (9.27) 

with initial state 1^.(0) = U(xk, 0) = U*(x/ { , 0). Note that (9.27) can be obtained 
directly from (9.15) by approximating only the spatial derivatives in this equation 
by finite differences and that (9.26) results by integrating (9.27) over time intervals 
of length At. The vector Y(t) defined by (9.26) and (9.27) can be viewed as the state 
of discrete and continuous time linear systems with random coefficients of the type 
studied in Sects. 7.3 and 7.4. 

The recurrence formula (9.16) and its matrix version (9.26) are useful only if 
their solutions can be used to approximate U(x,t) defined by (9.15), that is, { U" : j 
converges in some sense to U(x, t) as the spatial-temporal lattice used to construct the 
finite difference operator is refined. The classical Lax-Richtmyer theorem states that 
consistency and stability of a finite difference operator are necessary and sufficient 
conditions for the convergence of { Uj! } to U(x,t) for well-posed, deterministic, linear 
partial differential equations ([10], Chap. 2). An operator is consistent if the 
truncation error J?[U(xk, t,,)] — converges toOas/? — > 0 and At — »■ 0. We 

say that .2)" is stable if the recurrence formula relating future values U(-,t + At) 
to current values U(-,t) generated by this operator does not amplify perturbations 
in data. The finite difference scheme in (9.16) has this property if, for example, a 
norm || / + A At\\ of matrix I + A At in (9.26) that is compatible with the norm 
of Y(t) satisfies the condition ||7 + A At || < 1 or, equivalently, the eigenvalues of 
I + A At are included in the unit disc centered at the origin of the complex plain 
([10], Chap. 2). 

The proof of Theorem 9.1 is a direct application of arguments used to establish 
the convergence of finite difference schemes for deterministic, linear partial dif- 
ferential equations. We simply apply these arguments to almost all samples of the 
random conductivity field. The following example uses an extension of the classical 
Lax-Richtmyer theorem, referred to as the stochastic mean square Lax equivalence 
theorem [14], stating that a consistent approximating scheme for a SPDE is conver- 
gent in the mean square sense if and only if it is stable [14]. The conditions in the 
stochastic Lax theorem are weaker than those in Theorem 9.1 since they are for the 
mean square convergence of the sequence of finite difference solutions. 

Example 9.3 Let U(x.t) be the solution of partial differential equation 
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f 3 U(x, s) r 

U(x,t)-U(x,0)+a ——ds + b U(x,s)dB(s) = 0, t e [0, r], x e R, 

Jo dx J 

0 

(9.28) 

with initial state U(x, 0) = f(x), where fix) denotes a specified deterministic function 
and B is a Brownian motion. The finite difference approximation of this equation is 

Uj) +l = (l + R)U£ -RU£ +l -bUj!AB n , n = 0,1,..., ke Z, (9.29) 

where Uj? is an approximation for U (k Ax, n At), Ax and At denote the space 
and time steps of the finite difference mesh, 3 U (k Ax, n At)/dt — (Uj) +1 — U£)/ 
At, dU(k Ax, n Ax)/dt — (U£ +l — U£)/ Ax, R — a At /Ax, and AB n — B((n + 
1) At) — B(n At). It is assumed that a. Ax, and At are such that R e (—1,0). 

Let || §■ || o l^’l 2 ^denotes the £ 2 , zu -norm of a countable sequence 

£ = (■ ■ • , £_i , £ 0 , ti , • • ■) of reals. The finite difference scheme is stable in this norm 
since 


4l|£/' !+1 |l! J < {l+b 2 At)E\WU n Wl Ax \ < {\+b 2 At) n+l E\\\U°\\U, 


"h ,Axi 


so that 


E \\U n 


'■LJ s(i + gi)"*' ellP’lL) - **'' 


for At = t/(n + 1). 

Let 4>(x, t) be a smooth function, that is, a function that is continuously differen- 
tiable in x and continuous in t. The operator jSf defined by (9.28) applied in a time 
interval ( n At, ( n + 1) At) at node Xk = k Ax and the associated finite difference 
operator are 


^f[cp]'l —cp(xk, ( n + 1) At) - (p(xk, n At) + a 

J n 


(n+l) At 3 tyfxk, s) 


dx 


ds 


'•(n+l) At 


(j) x (xk, s)dB(s) and 


Jz?/ 1 [0] =4>(xk, (n + 1) At) - <p(xk, n At) + R (<p x (xk+\, n At) - <p x {xk, n At)) 
+ b(f>(xk, n At) AB n . 


The expressions of ££ and ..2^' can be used to prove the convergence ZsjjjSf [</>]£ — 
,5^'|>/j]| 2 ] — > 0 as Ax, At — * 0, which shows that the finite difference operator 
is consistent in the mean square sense. We summarize the essential arguments used 
to show that 2zf^ is consistent. The complete proof can be found in [15]. O 


392 


9 Stochastic Partial Differential Equations 


Proof The recurrence formula in (9.29) with R e (—1,0) gives 

E[\U£ +l \ 2 ] = E[\(l + R ) U n k ~RU£ +l — b Uj! AB n \ 2 ] 

= E[\(\ + R)U£ - RU'l +x \ 2 ]+b 2 AtE[\U£\ 2 ] 

< (1 + R ) 2 E[\U£\ 2 + R 2 E[\U£ +l | 2 ] + 2 |(1 + R) R\ £[|t/' ! t/” +1 1] + b 2 At E[\U'{\ 2 ] 

< [(1 + R) 2 + |(1 + R) R\ + b 2 At] E[\U 1 1 2 ] + [R 2 + |(1 + R) J?|] E[\U£ +l \ 2 ] 

(9.30) 

by using 2 E[\U£ U k+l |] < E[\U£\ 2 ] + E[\U" +l \ 2 ]. Since |1 + R\ + \R\ = 1 and 
| R | < 1 , we have 


X E[K +1 \ 2 ]<(l+b 2 At) £ E [\ u k\ 2 l (9-31) 

k=—o o k=—o o 


so that£[||[/"+ 1 || 2 v J < (l+fo 2 Z\f)£[l|t/ n ||^J. 

The mean square discrepancy between the operators and Jzf,” [<•//] is 


E[\^r k -^m 2 ]=E 


!•(«+!) 


<t>x(.Xk, s) 


Jn At 
f (n+1) At 


Ax 


els 


J n At 
<2 a 2 E 

+ 2 b 2 E 


{</>(x k , s) - <j>l) clB(s) 


r*(n+l) At 


<Px(Xk, S) 


Ax 




ds 


r(tl+\) At |--j 

/ (<P{x k ,s) ~(p'l) dB(s)\ \, 

Jn At I J 


(9.32) 


so that Zs[|^f[0]^ — Aff[<p]\ 2 ] — > 0 as Ax, At —>■ 0 since the latter expectation in 
(9.32) is smaller than E [ | <f>(x k ,s) — (f>%\ 2 ds]. 

For the mean square convergence of the finite difference scheme, note that the 
difference Z k = U(x k ,n At) — U k satisfies the recurrence formula 


r yn -\- 1 r jYl 

L k — L k 


r (n+l)At f du(x k ,s) _ u n k+x -u n k \ ds 
Jn At V dx Ax J 


r (n+1) At 


(U{x k ,s)-U2)dB(s), 


where 


U (x k , (n + 1) At) = U (x k , n At) — a 
/■(n + 1) At 


/■(n+1) At 9{/ ( X;S ) 

In At 


dx 


U (x k , s)dB(s). 
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Straightforward calculations and facts from [16, 17] are used in [15] to show that there 
exists a constant c > 0 such that £'[||Z" +1 ||^ 24jc ] < 4 c (l + (1 + n At) e fc2 " At ) At, 
so that ZifU Z” +1 ||^ 2jAt ] — >- 0 as At — > 0. A 


9.4 Applied SPDEs: Arbitrary Uncertainty 

The SPDEs encountered in applications are usually more general than those con- 
sidered in the mathematical literature. They are used to describe a broad range of 
physical phenomena, and their coefficients, input processes, and/or end conditions 
are Gaussian or non-Gaussian random functions of time and space, that are much 
more general than spatially dependent Gaussian white noise processes (Definition 
9.2). Advance concepts of random functions and Ito’s calculus are primarily used in 
the mathematical literature to solve SPDEs (Sect. 9.2). In the applied literature, the 
Monte Carlo simulation, stochastic Galerkin, and collocation methods are usually 
employed to solve SPDEs. A recent alternative to these methods is offered by sto- 
chastic reduced order models (SROMs). Taylor, perturbation, Neumann series, and 
related methods provide useful and simple solutions for SPDEs with random entries 
of small uncertainty (Sect. 9.5). 

Let jSf be a random differential operator, that is, a differential operator with 
random coefficients defined on a probability space (£2, P) that involves both 
spatial and temporal derivatives, and let U(x,t) be the solution of the initial-boundary 
value problem 


£’[U(x,t)] = V(x,t), xeD , t e (0, r), (9.33) 

with random coefficients and input, where the input V(x,t) is a random element on 
(£2, , P), D denotes a bounded open subset of , and (0, r) is a bounded time 

interval. The differential equation in (9.33) is accompanied by initial and boundary 
conditions that may be deterministic or random. If «£? and V are time invariant and 2z? 
involves only spatial derivatives, then (9.33) becomes a stochastic boundary value 
problem denoted by J 2 ?[t/(x)] = V(x). 

Analytical solutions for (9.33) and its time-invariant version ~f[U (x)] = V (x) 
are rarely possible. Numerical methods need to be used for solution. Since these 
methods can only solve problems with finite numbers of degrees of freedom, it is 
only possible to solve approximate versions of (9.33), since the solution of (9.33) 
has an infinite number of degrees of freedom. Finite element, finite difference, and 
other methods can be employed to discretize the physical space. For example, (9.33) 
becomes an ordinary differential equation (ODE) with respect to time if its spatial 
derivatives are approximated by finite differences. The resulting ODE has random 
coefficients and input (Sect. 7.4). If the time argument is also discretized, the ODE 
becomes a recurrence formula with random coefficients and input (Sect. 7.3). The 
spatial discretization of the stochastic boundary value problem S£ [U (x)] = V (x) 
results in a stochastic algebraic equation (SAE) of the type discussed in Chap. 8. 
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The solution of approximations of (9.33) obtained by discretization of the physical 
space by the methods in Chaps. 7 and 8 is likely to be inefficient since the number 
of random variables in the resulting approximate equations can be very large. For 
example, let E {x) be a random field in the definition of (9.33). The ODE derived from 
this equation by approximating all spatial derivatives by finite differences depends 
on the values of E (x) at the nodes of the finite difference scheme. The number of 
nodes can be excessive if high resolution meshes in space are required and/or the 
domain of definition of (9.33) is a subset of M 3 rather than an interval of the real line. 
Moreover, numerical difficulties may be encountered since values of E ( x ) at closely 
spaced nodes are likely to be strongly dependent. It is necessary to also discretize the 
probability space, which means to represent the random elements in the definition 
of (9.33) by parametric models that must (1) depend on a relatively small number of 
random variables for computational reasons and (2) satisfy both mathematical and 
physical constraints to guarantee the solution existence and uniqueness and provide 
realistic representations for physical properties (Sect. 6.3.3). 

The solutions of SPDEs use different representations for the random elements in 
the definition of these equations. In the Monte Carlo method, the random elements 
in (9.33) are represented by a relatively large number of independent, equally likely 
samples of these elements. In the method based on stochastic reduced order models 
(SROMs), the random elements are represented by a relatively small number of 
samples that are not equally likely. In the stochastic Galerkin method, the random 
elements are approximated by parametric models usually obtained by truncating 
Karhunen-Loeve representations of these elements and, subsequently, describing 
the random variables { Z,- } in these representations by truncated polynomial chaos 
expansions. In the stochastic collocation method, approximations are constructed for 
U(x, t) by interpolating between deterministic solutions of (9.33) corresponding to 
selected values of { Z,- } in the range of these variables. 


9.4.1 General Considerations 

Let U(x) be the solution of a stochastic boundary value problem // J \U (x)] = V (x), 
x e D, accompanied by conditions that U must satisfy on the boundary 3D of D, 
where D C is an open bounded subset of R' / . It is assumed 2z? is a linear 
differential operator defined by 

&[U(.x)]= X (-1) N (a a p(x)9>Pu(x)), (9.34) 

|a|,|j6| <m 

where a a p : D — »■ M are real-valued functions, a = (ai, . . . , aj), a; > 0 are 
integers, |a| = u\ + ■ ■ ■ + ad = 0, 1, and £> a — d^/dx“ l ■ ■ ■ dxj d denotes 

the cr-th weak derivative with respect to the spatial arguments of U ([18], Chap. 7). 
For a second order problem, m = 1 so that ££ has the form 
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9 ( dU(x)\ JL dU(x) 

&[U (x)] = ao(x) U (x) - ^ — (fljjW I + y^ g, (x) (9.35) 

i'J=l ° 1 ' j ' i = 1 ' ' 

Recall that a function / e L} (D) has a weak derivative 5^" / if there exists 
g e suc ^ /zt f>C*) 0( x ) <Lr = 1)'“' Jo f(x) (p (a \x) dx holds for all 

(p e C£f(D), where (D) is the set of functions / : D -* R with the property 
/ e L l (K) for all compacts in the interior of D and (p e C°°(D ) with compact 
support ([18], Chap. 1 1). 

Definition 9.3 The operator .i? in (9.34) is elliptic if X;«j \p\=m a <*p( x ) f Ofor 
allx e D and arbitrary £ = (§i, . . . , %d) e R d , f ^ 0, wherecu; > 0, i = 1, . . . , d, 
are integers and £“ is a short hand notation for • • ■ £?“*. The operator^? is said to be 
strongly elliptic if there exists q > 0 such that | Y.\ a \,\p\=m a a p(x)^ a+p \ > q ||§|||'". 

Prior to considering partial differential equations with operators given by (9.34), 
we define a set of functions //"' (D) that constitutes a natural home for the solution of 
these equations. The set H m (D), referred to as Sobolev space, consists of real-valued 
functions defined on D with the properties 

H m (D) = [U : D -* M, 9> a U e L 2 (D) for |ar| < m], (9.36) 

where m > 0 is an integer and L 2 (D) denotes the set of real-valued square integrable 
functions defined on D. Note that H"(D ) C H m (D) for n > m and that H°(D) = 
L 2 (D). 

Theorem 9.2 The set H m (D) with the norm 

\\U\\ 2 Hm(D) = (U, U) H m (D) = X H^ll li(D) (931) 

\a\<m 

is a Hilbert space, where || • \\h m (D) is induced by the inner product 
(U , V) H m (D) = / X {&*U(.x))(9 a V(x))dx= 

22 \a\<m |a | <m 

(9.38) 

and {■, ■) [j(Oj denotes the inner product in L 2 (D). 

Proof Simple arguments and the definitions in (B.8) and (B.15) show that H m (D) 
is a linear space and that (-, •) h"'(D) is an inner product on it. It remains to show that 
H'"(D) with the norm in (9.37) is complete. 

Let [Up] be a Cauchy sequence in H m (D), that is, \\U p — U q || H m (D) — > 0 as 
p, q — »■ oo. This implies \\S> a (U p — U q ) \\ L i( D) -> 0 as p, q —> oo by the second 
equality in (9.37), so that {2> a U p ) is Cauchy in L 2 (D) for each a < m. Since 
L 2 (D) is complete, U p converges to a function U <a> e L?(D). We have 
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= jTj&Up’ +hHD) = (- 1 * 1 " 1 JTJ U P’ ^ lhd ) 

= (-l)l“l(l.i.m. p _oot/p.®“«> = (-l) l “ l [ U(x)$> a ct>(x)dx 

JD 


(9.39) 


by the continuity of the inner product and properties of generalized partial deriv- 
atives, where l.i.m. denotes mean square limit and </> : D — > R. is an arbitrary 
infinitely differentiable function with compact support ([8], Sect. 1.2, [18], Chap. 7). 
The equality f D U^ a \x) <p(x) dx = (— l)l“l f D U (x) S> a <p{x) dx shows that U ^ 
is the weak derivative of order a of U since this function and all its derivatives of 
order |a| < m are in L 2 (D), so that U £ H' n (D). ▲ 


9.4.2 Deterministic Boundary Value Problems 

Consider the deterministic boundary value problem 


J zf [£/(*)] = V(x), x e D c R d , 


(9.40) 


with the boundary conditions (BCs) lh U = gk, k = 0, 1, ... ,m — 1, where Jz? 
is given by (9.34) and {Bk} denote linear differential operators. It is assumed that 
the boundary value problem (9.40) is well-posed, that is, it has a unique solution 
that depends continuously on data in the sense that there exists a constant c > 0 
such that || U || h s (D) < c || V \\ H s-im( D ) provided U £ H S (D). If (9.40) has this 
property and Uk(x) are solutions for data Vjt(x), k = 1, 2, then || £/i — U 2 \\h s (D) < 
c || Vi — V 2 1| (£)) so that small changes in data can only yield small changes in the 

solution. A useful discussion on the existence, uniqueness, dependence on data, and 
regularity for solutions of elliptic boundary value problems can be found in ([18], 
Chap. 8). We are interested in conditions under which well-posed boundary value 
problems admit unique weak solutions. 

In the remainder of this section, it is assumed that S£ in (9.40) is an elliptic operator 
and that the essential boundary conditions for this equation are homogeneous, that 
is, boundary conditions involving differential operators of order strictly smaller than 
m ([18], Sect. 9.2). Let 


y m (D) = {W e H"\D) : ^satisfies all essential BCs] (9.41) 


be the space of admissible functions. Arguments similar to those used to show that 
H'"(D ) with the norm in (9.37) is a Hilbert space can be used to show that i /m ( D) 
with the same norm is a Hilbert space. The weak form of (9.40) is 


sV (U, W) = (V, W), WWeV m (D), 


(9.42) 


where 
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£/ (U , W) = / ^ a a p ix) $) a U (x) W (x) dx + boundary terms 

Jd \«\M <m 

is a bilinear form and (V, ■} : i / ' m iD) — ► K is a linear functional ([18], Sect. 9.2). 
This form results by multiplying both sides of (9.40) with W e y n (D), integrating 
over D, and using Green’s theorem. 

Our objective is to find U e '1 /m ( D) satisfying (9.42), that is, the weak solution 
of (9.40). If the bilinear form /// ( ■ , •) is bounded and elliptic, that is, if there exists 
constants M > 0 and f > 0 such that W)\ < M \\U\\ym^ D) || W||y®i( 0 ) and 

jz/iW, W) > ft ||W||^m(£)) for a ll U, W e y m (D), then (9.42) admits a unique 
solution by the Lax-Milgram theorem (B.44). Moreover, the solution depends con- 
tinuously on data ([18], Theorem 1, p. 316). 

Example 9.4 Let U : I) —> R be the solution of the boundary value 
-V • (a(x) VU(x)) = f(x), x e D. 
with U(x) = 0 for x e 3 D. The weak form of this equation is 

@{U, W) = (/, W) L 2 lD) , vw e W(D), (9.44) 

where (/, W) L 2 (D) = f D fix) Wix)dx , 
y//(D) = {W : D K, IT e L 2 (D), VW e L 2 (D), and IT = 0on3£»] (9.45) 
is the linear space 'Y x iD) in (9.41), and 

mu,W)= / a(x)VU(x)-VW{x)dx, U,WeWiD), (9.46) 

J D 

is a real-valued bilinear form defined on W iD) x W (D). Note that the left and 
the right sides of (9.44) correspond to &?iU, W) and ( V , W) given by (9.42) and 
that the inner product and norm on W iD) are those on 'V 1 ( /9 ) in (9.41), that is, 
(U, W)yp (D) = J D (U W + S/U -VW) dxand\\U\\ 2 r(D) = J D (U 2 + VC/ • V£/) dx. 
If / e L 2 iD) and a(x) e [a, f], 0 < a < f < oo, for all x e D, then (9.44) admits 
a unique solution. O 

Proof The right side of (9.43) multiplied with W e W iD) and integrated over D 
gives (/, IT) = f D fix) Wix) dx. The left side of(9.43) multiplied with W e W iD) 
and integrated over D gives 

-V [ —(aix)^^]wix)dx= f aix)VUix)-VWix)dx, VWeW(D), 
Jd dx t \ dxj j j D 


problem 

(9.43) 


by Green’s theorem ([18], Sects. 7.2 and 7.4) that can be given in the form 


3v 

dx: 


■ dx 


i vm ds 


du 

v dx, 

i dx: 


(9.47) 


where D is an open bounded set in nix) — in\ix), ... , iidix)) denotes the 
outward unit normal to D at x e 3 D, and u, v e H l iD). 
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That "W ( D) in (9.45) is a linear space and 


(U,W)yr iD) = / (U W + VC ■ VW)dx 
J D 

defines an inner product on this space follows by straightforward arguments. Since 
/ e L 2 (D), the right side of (9.44) is linear in W and bounded. The bilinear form 
88 is continuous since 


\88(U, W ) | = / (a(x) l/1 VU(x)) ■ ( a(x) l/ 2 Vff(r)) dx 

J D 

<(J a(x)S7U(x)-S7U(x)dx J a(x)VW(x) ■VW(x)dx S j 
<P\\U\\yr (D) \\W\\yy (D) , WU,WeW(D), 


1/2 


by the Cauchy-Schwarz inequality, the assumption a (x) e [a, /J], 0 <a < ft < oo, 
x e D, and the inequality 

/ S7W(x) ■ VW(x)dx < / (W(x) 2 + VW(.t) ■ VW(x))dx = \\W\\^ (D) 

J D J D 

for all W e W(D). 

The bilinear form in (9.46) is also elliptic since 


88(W,W) = J a(x)VW(x)-VW(x)dx>ct J VVK(.r) • VW(x)dx = a II V W||^ 2(£)) 

by properties of a(x). Also, there exists a constant c > 0 such that f D W 2 dx < 
c j f) V W ■ V IT dx by the Poincare-Friedrichs inequality ([18], Theorem 9, Chap. 7), 
so that || W || ^ (D) = \\ w \\ 2 L 2 {D) + \\VW\\ 2 l2(D) < (1 + c) WW\\ 2 l2(D) , which 
gives II > ||1P||^( D )/(1 + c). The latter inequality and 88(W, W) > 
a || V W \\ 2 L 2 ( D) imply 88 (W, W) > [a/(l + c)] II W]| Then (9.44) has a unique 
solution by the Lax-Milgram theorem (Theorem B.44). ▲ 


9.4.3 Stochastic Boundary Value Problems 

Consider now the stochastic version of (9.40), that is, the coefficients of the differ- 
ential operator , the right side, and/or the boundary conditions of this equation 
depend on random elements defined on a probability space (82 , & , P), so that the 
resulting solution U : D x £2 — K is a function of x e D and oi e 82. Let 
( D x 82, 88(D) x 8F , X x P) denote the product of measure spaces ( D , 88(D), X) 
and (82 , & , P), where 88(D) is the Borel cr -field on D and X denotes the Lebesgue 
measure on R rf . The analog of the space of admissible functions V m (D) in (9.41) 
for stochastic boundary value problems is the set 
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T n (D,£2) ={W : D x £2 -* M, S> a U e L 2 (Dx £2, @(D) x J?, lx P), |a| < m, 
£§(D) x ^-measurable, and satisfies P-a.s. all essential BCs}, 

(9.48) 

where 2) a U denotes the a-th weak derivative of JJ with respect to spatial arguments 
([18], Chap. 7) and X(dx ) = dx is the infinitesimal volume in the physical space. 

Theorem 9.3 The set of functions 'V"‘ ( D , £2) with the norm 


ii y m (D,Q) 



Z ll^ll 2 LHD) 

|of| <m 


(9.49) 


is a Hilbert space, where || ■ || is the norm induced by the inner product 


(U, W)y m(DM) = E 



(@ a U(x,-))(g> a W(x,-))dx 


= E 


Z ^ a U,^ a W) L 2 (D) 

|a|<m 


X (® a U,® a W) L 2{Dxn) , 

\a\<m 


(9.50) 

and (•, ■) L 2 (e, X £ 2) denotes the inner product on L 2 (D x £2). 


Proof That y n (D , £2) is a linear space and (9.50) is an inner product on this space 
follows by checking that the defining properties of linear spaces and inner products 
hold. Since Y m ( D, £2) consists of 38(D) x ^“-measurable and Xx P-integrable func- 
tions, U(x, •) is ^"-measurable and P-integrable, U (•, co) is .5^(/9)-measurable and X- 
integrable, the expectation J Q U (•, co) P(dco) is ^(Z))-measurable and k-integrable, 
the integral f D U(x, ■) X(dx) is ^"-measurable and P-integrable, 
and the order of the integrals over D and £2 can be interchanged by Fubini’s theorem 
(Theorem 2.9). 

Consider a Cauchy sequence {U p Jin y m (D, £2), that is, \\U p — U q \\ym( D -> 0 
as p, q — »■ oo, which implies \\£3 a (U p — U q )\\ L 2( D xC2) 0 as p, q — ► oo for |a| < 
m. Since L 2 (D x £2) is complete, 3) a U p converges to an element e L 2 (Dx £2) 
as p — ► oo. Let </> : D x £2 — > M be as in (9.39) for almost all co e £2 . Calculations 
as in this equation give 


U (ot \x, a>)cp(x , «) X(dx) 


L J D 

for I a I < m, so that 


= E 


(-D 


l“l 


U (x, to) S) 01 <p(x, ft)) X(dx) 

(9.51) 


f/ (Q,) (x, ft>) <p(x, ft)) X(dx) = (-1) 


M 


U(x, a>) 3> a <p(x, ft)) X(dx) (9.52) 


holds P-a.s., which shows that U (u) is the weak derivative of order a of U for almost 
all ft) e £2 with respect to the spatial argument x e D. Since U (u> is in L 2 (D x £2) 
for all | cr | < m, we have U e Y m {D, £2). A 


400 


9 Stochastic Partial Differential Equations 


Example 9.5 Suppose the coefficients and the right side of (9.43) are random func- 
tions on a probability space i£2 , J£", P), so that its solution U is a random function 
defined on the same probability space that satisfies the stochastic boundary value 
problem 


— V • ( a(x , co ) VD(x, co)) = f{x, co), ( x , co) e D x £2, (9.53) 

with U(x, co) — 0 for.r e 3 D and almost all co e £2 and / e if (I) x £2). The weak 
form of (9.53) is 


&(U,W) = {f,W) L 2 (Dxa) , WWeWiD,£2), 
where (/, W) L i (DyQ) = E[$ D fW dx]. 


(9.54) 


WiD, £2) = {W : D x £2 R, W e L 2 (D x £2), VW e L 2 (D x £2), 

W = £$(D) x ^"-measurable, and W(x, ■) = 0, x e 3D, P-a.s.) 

(9.55) 

is a linear space, and 


38(U, W) = E 


a(x,-)WU(x,-) ■ VW(x,-)dx 


(9.56) 


is a real-valued bilinear form defined on W (D, £2) x W (D, £2). The inner prod- 
uct and the norm on W (D , £2) are those on y l (D, £2), that is, { U , 1 V)y^(D,n) — 
E[ J D (UW + VU- VW) dx] and W\\ 2 //{Da) = E[ f D (W 2 + VW ■ VW)dx\ 
If the functions a, f : D x £2 — > R are £$(D) x /^"-measurable and X x P- 
integrable, a(x, co) e [a, fi\, 0 < a < ft < oo, for all x e D and almost all co e £2, 
and f e L 2 {D x £2), then (9.54) admits a unique solution. Moreover, the solution 
of (9.54) is stable. Note that P3(D) and £$(U, W ) in (9.56) have different meanings. 
❖ 

Proof The functional (/, W)^(d,S 2 ) = E [ f D fix) W ix) dx] of W e W (D, £ 2 ) is 
linear and bounded. The bilinear form in (9.56) is bounded since 


i mu, w) i = 


a(x, co) VUix, co) ■ VW(x, co)Xidx) Pidco)\ 


('[/. 


aVU -VUdx 


aVW-VWdx 


1/2 


< p WUWy^DM) IIWII VD, W e WiD, £2) 


by the Cauchy-Schwarz inequality, the relationship || ^^Wl 2 (DxQ) — ll^ll iP(D.n)i 
and properties of a (x , co). Arguments as in Example 9.4 show that this form also 
has the property £@iU, U) > c \\U\\ 2 ^ (D Q y VD e WiD, £2), where c > 0 is a 
constant. The Lax-Milgram theorem (Theorem B.44) implies that (9.54) admits a 
unique solution (see also Sect. 9.4. 8. 3). 
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The solution stability follows from the inequality P$(U , U) > a || U \\^^ D Q) and 
the fact that (/, U) is a linear continuous functional. The latter observation implies 
that (/, U) is bounded (Theorem B.23), that is, there exists a constant c" > 0 
such that | (/, U)\ < c" \\U\\-%/(D.C 2 )’ U e W(D, Q). These inequalities and (9.54) 
give c' \\U\\ys (D Q^ < @(U,U) = \@(U,U)\ = \(f,W)\ < c" \\U\\^ (DM) , so 
that \\U\\w(D q) < c" /c' . Solution uniqueness follows from this condition since, 
if {Uk} are solutions, that is, dd(Uk, W) = (/, W) L Z(DxQ)’ k = 1, 2, we have 
@{Ui - U 2 , W) = 0 for all W e W(D, ^2) so that ||t/i - U 2 \\w (D ,n) = 0. A 

There is no conceptual difference between deterministic and stochastic boundary 
value problems. The Lax-Milgram theorem is used to establish conditions under 
which these problems admit unique weak solutions. The analogy between deter- 
ministic and stochastic problems extends to numerical solutions. For deterministic 
BVPs, the physical space needs to be discretize by, for example, finite differences 
or finite elements, such that the original problem with an uncountable number of 
degrees of freedom is approximated by a problem with a finite number of degrees 
of freedom. For stochastic BVPs, both the physical space and the probability space 
need to be discretized. The physical space is discretized as for deterministic prob- 
lems. The discretization of the probability space is usually achieved by representing 
the random fields in the definition of these equations by parametric models, that is, 
deterministic functions of space depending on a finite number of random variables. 
Truncated Karhunen-Loeve series are typically used to construct parametric models. 

The essential differences between deterministic and stochastic boundary value 
problems relate to the spaces in which we seek solutions and the measures and norms 
employed in these spaces. The solution for deterministic BVPs belongs to a subspace 
of H m (D ) endowed with the Lebesgue measure and the norm in (9.37). The solution 
for stochastic BVPs belongs to a subspace of the product of two measure spaces, the 
physical space ( D . 3S (/.)), X) and the probability space (Q . JF, P). The measure on 
this space is the product measure X x P, rather than the Lebesgue measure, and the 
norm is given by (9.49). 

We conclude this section with a brief discussion on the discrepancy between weak 
solutions of stochastic boundary value problems that have the same functional form 
but different coefficients and right sides. Let U be the solution of (9.54) and let 
U be the solution of , W ) = E [ J D f W c/a ], where 'is is defined by (9.56) 
with a in place of a. It is assumed that a and / have the same properties as a and 
/ and are defined on the same probability space as these random functions, so that 
'iSKU . IV) = E [ j D f W <±r] admits a unique solution. The following result relates 
differences between U and U to those between ( a,f) and (a, /). 

Theorem 9.4 The discrepancy between U and U can be bounded by 

W-U^D,a) < ^ (lla-aili«(Dxfl) \\U\\W(D,t2)+ ll/-/llL2( Dxfl) Y (9.57) 

where a > 0 and c > 0 are constants such that P(m f xe o{a(x)} > a) = 1. 

Proof Since U — U e W (D , Q), PS(U — U , W) is defined and 
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| MU - U , W ) | = 1 38(U, W) - S8(U, W) + mu, W) - 38(U, W)\ 

< i mu, w) - mu, w) i + j(/ - /, w) lHDx£2) \ 


(2 — a) VJ7 ■ VW2x] | + || / - /|| L 2 (Dx0) || W\\ LHDxi2) 

< |^y (2 - a) 2 VC/ • Vt/dxJ E \^J VW ■ VW2xJ ^ 

+ 11/ — /llz. 2 (Dxfl) Ill'll 

< ||a - 2|| i °o (flXi a ) ||lE||^/(/),fl) + ||/ - /IIl 2 (z)xK) 

where we use II V V||| 2(jDx ^^ < \\V\\ 2 ^ {DM) and \W\\ 2 Ll(Dxa) < II V\\^ DJ2) . The 
Poincare-Friedrichs inequality ([18], Sect. 7.5, Theorem 9) implies the existence 
of a constant c > 0 such that l|W|l2. 2(DxQ) < c ||VW||^ 2(i)xn) , sothat ||W||^ (D _ n) = 

ll^l&dJxO) + H VW H£*(z>xn) * a + C)||VW|| l 2(DxQ) . Since 38{W,W) = 
E[! D aVW- VWdx] > a ||VW||2 2(Dxn) , we have <%(W, W ) > d \\W\\ 2 r ^ Q) , 
where d = a/(l + c). 

c' lit/ - <3S{JJ-U,U-U)< (ll« - 5 ||l»(DxO) l|tf||jr ( D,fl) 

+ 11/ - /llL 2 (Dxt2)) 11^ _ ^ll)^(D,i2). 


which gives the bound in (9.57) following division by || U — U \\y^(D,C 2 )- 

Consistent with our intuition, the discrepancy between U and U depends on 
differences between ( a, f) and (a, /), and can be bounded by the sum of the norms 
||« — 5 1| z,°°(Z)x 42 ) and \\f — /Hi 2 (Dxi2) weighted with some constants. We can view 
U as an approximation of U that can be, for example, a numerical solution of (9.54). 

Consider first the deterministic version (9.44) of (9.54), and suppose that the 
domain D is partition in triangular finite elements { T r } with the largest mesh h = 
max r {diam(7))} > 0. The finite element space. 


Wh(D) — { W : I) — > TR continuous and linear in each T r and W = 0 on 2 D | , 

(9.58) 

is a subspace of W (D) so that 33{Uh, W) = (/, W) L 2 (Z)) , W e Wi,(D), has a 
unique solution. The discrepancy between the solutions U of (9.44) and its finite 
element approximation U/, e '#/, ( D ) can be bounded by 


\\U-U h \\ W{D) < c h, (9.59) 

where c > 0 is a constant that does not depend on h ([9], Chap. 2). Arguments 
yielding this inequality extend directly to finite element solutions for the stochastic 
problem (9.54) provided the random fields a(x, a>) and /(x, a>) are parametric mod- 
els depending on a random vector Z with finite dimension. These fields can be viewed 
as deterministic functions a(x, y) and /fx, y) of (x, y) e D x F, F = Z(Q), so that 
(9.54) can be interpreted as a deterministic equation that can be solved by standard 
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finite element algorithms using an extension of 'Wh(D) that includes approximat- 
ing functions defined on D x r [20]. Numerical solutions under this formulation 
are feasible only for problems with small stochastic dimensions, that is, problems 
depending on vectors Z that have just a few coordinates [21]. 


9.4.4 Monte Carlo Simulation 

Samples U (x, t, o>) of the solution U(x, t) of (9.33), that is, deterministic solutions 
of this equation corresponding to samples of 2z? and V, and end conditions, are used 
to estimate properties of U(x, t). The generation of samples of the random elements 
in (9.33) are usually based on parametric models, that is, deterministic functions of 
space and/or time depending on a finite number of random variables that may or 
may not be independent. Methods for generating samples of random elements are in 
Sects. 2.13 and 3.8. 

Let U {n) (x, t ) denote the solution of (9.33) with all its random elements approx- 
imated by parametric models depending on n random variables. Properties of U(x,t) 
are estimated from samples of U ( "Ux, t). A Monte Carlo simulation algorithm is 
useful if U {n) (x, t) converges in some sense to U(x, t ) as the discretization of the 
probability space is refined, that is, n —> oo. The performance of Monte Carlo 
estimators for properties of U(x, t) depends on the accuracy of the parametric models 
and the sample size, that is, the number of samples of U (n > (x , t ) used for estimation. 

Example 9.6 Let U(x,t) be a real-valued random function satisfying the stochastic 
partial differential equation (SPDE) 



with boundary and initial conditions U (0, t ) = U (0, /) = 0, t > 0. and IJ (x, 0) = 
0, x e (0,1), where V(x,t) = sin(vf), v > 0, E(x) = F~ l o <5(G(x)) is 
a translation random field, F denotes a distribution, and G(x) is a homogeneous 
Gaussian field with mean 0 and covariance function p(f) = E[G(x + £) G(x)] 
= (1+A, |f|) exp (— X |§|), X > 0, and one-sided spectral density g(v) = 4A 3 /[7 r (v 2 
+ A. 2 ) 2 ], v > 0. This SPDE is a special case of (9.15) for d = 1. 

The construction of a Monte Carlo algorithm for estimating properties of U(x,t) 
involves three steps. First, a sequence of homogeneous Gaussian fields G in> (x), n = 
1,2,..., with mean 0 and variance 1, that converges in some sense to G(x) is con- 
structed and used to generate approximately samples of G(x). Second, samples of 
G (,,) (x) are mapped into samples of E { "\x) = F~ l o cp(G ( " > (x)), that is, samples 
of a sequence of translation random fields approximating E (x). Third, samples of 
the solution t/ (n) (x, t) of (9.60) with E (n \x) in place of E(x) are calculated by 
deterministic solvers, and used subsequently to estimate properties of U(x, t). 

Let v > 0 be a sufficiently large frequency such that C g(v)dv ~ 1 and set 
G ( " ) (x) = Ylk = l a k' ] (A k cos(v k x) + B k sin(vi x)) , where v k = (, k-l/2)v/n , A k , 


404 


9 Stochastic Partial Differential Equations 



U^ n \x, t ) 


U^ n \x,t) 




Fig. 9.3 A sample of U^ n \x, t) corresponding to a sample a> ) of S (n \x) ( top left panel) 

and the spatial average of S (n \x, to) ( top right panel). The bottom panels are contour lines of the 
plots in the top panels 


(tt ) 

If. are independent /V(0. 1) variables, and n k is the square root of the area under g (v) 
in frequency band (v* — v/(2 n), v* + v/(2 n)). Under mild conditions as n — >■ oo 
and v oo, G ( "^(x) and F (n \x) become versions of G(x) and E(x), respec- 
tively, Z (n \x) converges to £(x) in the mean square sense for each x e (0, /), and 
U^ l \x, t) converges in mean square to U(x, t) in any bounded time interval. 

The plots in Figs. 9.3 and 9.4 are for / = 1, a Beta distribution F with range (a = 
0.01, b = 6.0) and shape parameters y = r] = 1, k = 10, and a finite difference 
scheme with space and time steps Ax = // 10 and At — 1.5/2000. The parametric 
model G (n 1 (x) for G(x) has been obtained by truncating the spectral density of G(x) at 
v = 30, scaling the truncated spectral density to have unit area, and approximating it 
by a discrete spectral density with n = 100 equally spaced frequencies. The parametric 
model for conductivity is Z tn, (x) = F~ l o®(G { " > (x)). The top left and right panels 
in Fig. 9.3 show a sample of U^"\x, t) corresponding to a sample E in> (x , a>) of 
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Fig. 9.4 A sample of U^ n \x, t) corresponding to a sample E^ n \x, co) of E^ n \x) ( top left panel) 
and the spatial average of S ^(x, co) ( top right panel). The bottom panels are contour lines of the 
plots in the top panels 


E (n \x) andasampleof t/^(x, t) corresponding to the spatial average E (n \x, u>) = 
(1 /l) fl E (x , a>) dx of conductivity sample E ^ (x , co) . The bottom panels in the 
figure are contour lines of the three-dimensional plots in the top panels. Figure 9.4 
provides the same information for another samples of E in Ux). The plots show that 
U <n fx, t) exhibits a notable sample-to-sample variation and the solution of (9.60) 
cannot be approximated by that of this equation corresponding to a homogeneous 
specimen with properties given by the spatial average of E tn) {x). <> 

Proof That G (,!) (x) becomes a version of G(x) as n —> oc and v — »■ oo follows from 
Theorem 3.46. Since 


n —¥■ oo, v —¥■ oo, 


P(^T=1 [Z {n \xi) < Zi}) = P(^T=l { G (n \xi) < <P~ l oF(zi)}) 

-*• P{ nr=t {G(xi) < <fi - 1 oF(zi)}) = P{ n T=l < z.}) 
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holds for any integer m > 1 , arbitrary arguments x; , and zi e K by properties of 
G (n> (x), E^ n \x) becomes a version of E (x) as n —> oo and v — > oo. 

If F ~ 1 o 0 is Lipshitz continuous, we have 

\E {n \x ) - E{x)\ =\F~ { o 0(G {n) (x)) - F~ l o 0(G(x))\ < c |G (n) (x) - G(x)\, 

(9.61) 

where c > 0 is a constant. This implies E\\E^ n \x) — E(x)\ 2 ] < c 2 E[\G {n \x) — 
G(x)| 2 ] so that E[\E^(x) — E(x)\ 2 ] -» 0 by the mean square convergence of 
G {n \x) to G(x), so that E ( "Ux) converges in the mean square sense to E(x). 

Let (f i , . . . , ^i) be discrete points in (0, /) defining the nodes of a finite difference 
mesh used to approximate the spatial derivatives of U(x, f). The Revalued stochastic 
process 7(f) = (Y\(t) = U(£\, t), . . . , 7^(f) — U(^d,t))' satisfies the ordinary 
differential equation 7(f ) = A 7(f) + W(t), t > 0, where A denotes a ( d , d) 
random matrix that incorporates the boundary conditions for (9.60) and W(f) = 
(W\(t) = V(£i, f), . . . , Wd(t ) = V{t;d, t ))'. Consider also the IR^-valued process 
fW(t) defined by Y^(t) = A (n) Y (n) (t)+W(t) in which A (n) isA with E in) in place 
of E. If the largest eigenvalue A max of (A + A')/2 is such that P( A max < 0) = 1, 
the discrepancy between Y {n \t) and 7(f) depends on that between A and A ( "\ and 
can be bounded by 

|| 7(f) - 7 (n) (f) 1| < e A ™*' [ e ~ A ™* s || (A - A (n) ) 7 ( " ) C?)|| ds, 

Jo 

where || - || is the Euclidean norm (Theorem 7.28). If there exists M > 0 such that 
||7 ( " ) (f)|| < M at all times a.s., we have 

|| 7(f) - 7 (n) (f) || < || A - A (n) || [ r A ““ 5 ||7 (,,) G)|| ds 

Jo 

< M||A-AW|| _ gA < M||A- At") II 

iV max iV max 

since ||(A-A ( ' !) )7 (n) (j)|| < || A — A (,,) || ||7 (n) (s)|| . This gives £[||7(f)-7 (,,) (f)|| 2 ] 
< ( M/a ) 2 E[||A — A^H 2 ], where a > 0 is such that P(|A max | > a) = 1. Since 
£[|| A — A (n) \\ 2 ] is asum of expectations E[{E{^ p )- E {n) {^ p )) (E{^ q )~ E (n) %))] 
with absolute value smaller than 

(e[{Z(S p ) - z in \$p)) 2 ] E[(E^ q ) - r w (^)) 2 ]) 1/2 

by the Cauchy-Schwarz inequality, we have ii[||A — A (, 6|| 2 ] -» 0 as n — > oo, 
which implies the mean square convergence of Y [n \t) to 7(f). A 

We have seen that Monte Carlo solutions of SPDEs require the representation of all 
random elements in these equations by parametric models. Parametric models used 
by the stochastic Galerkin and collocation methods are discussed later in this chapter. 
The models used by these methods are required to satisfy additional conditions that 
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Fig. 9.5 Spectral density 
s(v ) of G(x ) for 
v e [-3, 3] x [-3, 3] 



may limit their value in applications. The following example explores effects of these 
conditions on the second moment properties of the solution of a stochastic elliptic 
boundary value problem. 

Example 9.7 Suppose a{x), x e D = (0, l\) x (0, h) C M 2 , in (9.53) is a Beta 
translation random field defined by 

a(x) = a + (0 - a) F~^ ta{pq) o <p(G(x)) = h(G(x)), x e D, (9.62) 

where ERciMp.q) denotes the distribution of a standard Beta random variable with 
shape parameters ( p , q ), mean p/(p + q), and variance pq/[(p + q) 2 ( p + q + 1)], 
and 0 < a < fi < oo are constants. Note that a(x) takes values in the interval [o', fi], 
so that (9.53) is an elliptic partial differential equation a.s. The image G(x ) of a(x) is 
a homogeneous Gaussian field with mean 0, variance 1 , spectral density 


s(v) 


I 

2n y/\ — P 2 


exp 


v\-2p Vl V2 + v 2 
2(1 -p 2 ) 


v = (vi,v 2 ) eK 2 , (9.63) 


and covariance function 
c(r) — E[G(x ) G(x + r)] = exp 


r 2 + 2 p x\ r 2 + x\ 
2 


r = (ti, r 2 ) e M 2 , 
(9.64) 


where p e (—1,1). Figure9.5 shows the spectral density of G(x) for p = 0.7 and 
v = (vi, v 2 ) in the rectangle [—3, 3] x [— 3, 3]. Samples ofa(x) can be calculated from 
samples of G{x) that can be generated by various Monte Carlo algorithms (Sect. 3.8). 
The second moment properties of a{x ) are E[a{x)\ = a + (/} — a) p/(p + q) 
and c fl (r) = ^[(^(x + r) — E[a(x )] ) ( a(x ) — £[a(x)])] = E[(h(G(x + r)) — 
E[a(x )]) ( h(G(x )) — E[a(x)])\. Translation random fields of the type in (9.62) 
can be used to describe the spatial variability of conductivity or other properties of 
random microstructures. 

Current implementations of the stochastic Galerkin and collocation methods 
approximate a{x) by linear parametric models a (n> (x). that is, finite sums of specified 
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deterministic functions of x e D with random coefficients (Sects. 9.4.8 and 9.4.9). 
Truncated Karhunen-Loeve series or discrete spectral representations are commonly 
used to construct linear parametric models. For example, the linear parametric model 


a (n \x) = ao + ^erj.. (Ck cos(v {k> ■ x ) + Dk sin(v® ■ x)) (9.65) 

k= l 

constitutes a discrete spectral representation of a(x), where {cr^} and ao are constants, 
v® e R 2 denote frequencies, and {Ck, Dk} are uncorrelated random variables with 
mean 0 and variance 1 . The mean and covariance functions of the parametric model 
a ( ")(x) are E[a^ n \x)] = ao and Ca‘\ r) = E[{a (n Hx + r) — ao) (a^(x) — «o)] = 

ZLt a l cos ( v ® • T )- 

The model size n in (9.65) defines the dimension of the probability space, that is, 
the number of random variables in the expression of a (n> (x), and is selected based 
on computational considerations. The frequencies {v® } can be chosen at the centers 
of rectangles {Ik} partitioning the frequency band I = [— v, v] 2 , 0 < v < oo, of 
the spectral density s a (v) of a(x), that is, I = U" = ] Ik and Ik fl It — 0 for k ^ Z, 
where / is such that s a (v) dv ~ 1. The second moment properties of a (n> (x) with 
a 2 = fj s a (v) dv converge to those of a(x) as n — >■ oo and v — > oo (Sect. 3.8.1 .2), so 
thata^(x) with ao = E{a(x){ is approximately equal to a(x) in the second moment 
sense for sufficiently large n and v. 

However, the samples of a (n ^{x) and a(x ) may have very different properties. 
For example, a{x) takes values in {a, ft] while the samples of a ( '^(x) may leave 
this interval with a non-zero probability that depends on the distributions postulated 
for {Ck, Dk}. To confine the samples of a <n) (x) to a bounded interval in (0, oo) so 
that (9.53) with « <n 1 (x ) in place of a(x) is elliptic, it is commonly assumed in the 
stochastic Galerkin and collocation methods that the distributions of {Ck, Dk} have 
bounded support, that depends on n and are such that a Hx) has strictly positive 
samples. A version of this model is the random field 

n 

a ( "' f) (x) = ao + £ ^ dk ( Ck cos(v® ■ x) + Dk sin(v® • x)), (9.66) 

k= 1 

which resembles a (ll} (x) except that its random part is scaled by a factor 0 < ? < 1 
such that m'm X fi[){a t ' n ’^ (x)} > 0 a.s. The factor C, depends on n and the distributions 
of {Ck, Dk}. Note that a 1 "' 21 (x) with ao = E[a(x){ has the same mean as a(x) 
and that the covariance function of this field scaled by £ 2 , that is, the function 
Ca'\ r )/£ 2 , converges to c a { r) as n — > oo and v — > oo, so that the approximation 
Ca'\ t)/C 2 — Ca(r) is satisfactory for sufficiently large values of n and v. The 
variance of a {n ’ k \x) can be much smaller than that of the target random field a(x) 
since Var[u ,( ^"R > (x)] ~ t; 2 Var[a(x)]; it approaches 0 as n increases indefinitely since 
a {n> (x) with independent random variables {Ck, Dk} of bounded support becomes 
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a Gaussian field as n -* oo (Sect. 6.3.3. 1). Moreover, the distribution of a in - C ’ ) (x) 
depends on n and differs from that of a(x). 

In summary, the model (x) of a(x) in (9.66) can be tuned to have the same 
mean as a(x). Its scaled covariance function c a n) (r)fa 2 approximates satisfactorily 
the covariance function c a ( r) of a(x) for adequate values of n and v. The variance of 
a {n ’ k Hx) can be much smaller than that of a(x ) since Var[a ( "'^(x)] ~ £ 2 Var[c/(x)]. 
For example, if {C* , Dj<} take values in (— fa, fa), 0 < fa < oo, k = 1 , ... ,n, then 
| a ( «.f)( x )_ ao | < £ a k(Cl+Dl) 1 ! 2 <V2$ ZLi a k fa sothata^-^ W has 

strictly positive samples if cio — \/2 C X/t= l °> fa > Oort; < ao/(*j2 X*=i a k fa)- 

Also, the distributions of a(x) and a (n '^(x) differ. <> 

Example 9.8 Let U(x ) denote the temperature field in a rectangular specimen 
satisfying the stochastic partial differential equation (9.53) with f(x) = 0, D = 
(0, /i) x (0, h), h = 12, I 2 — 6, U (0, X 2 ) = 0 and U ( l\,X 2 ) = 1 for X 2 e (0, h), 
and 3I/(xi, 0)/3 x2 = dU(x\,l 2 )/dx 2 = 0 for xj e (0, /j). The conductivity field 
a(x) is given by (9.62), (9.63), and (9.64) with a = 0.25, /3 = 100, p = 1, q 
= 3, p = 0.7, and v = 3. The coordinates of the frequencies v {k> in (9.65) are 
v r = —v + (k — 1/2) (2 v/n), r — 1,2. Since the difference between the covari- 
ance function c(r) of G(x) and the scaled covariance c a (r)/c a (0) of a(x) is small 
([22], Sect.3.1.1), we assume c a (r)/c a (0) — c( r). 

Let U (n ' k Hx) be the solution of (9.53) with a ( "’^(x) in place of a{x). The random 
variables in the definition of a^ n, ^(x) are assumed to be independent, uniformly 
distributed with mean 0 and variance 1, that is, Ck, Dk ~ {/(— V 3, V3), k = 
1, . . . , n. The scale factor / is selected such that less than approximately 5% of 
the samples of a ( "’^(x) take negative value in /), for example, / = 0.3286 for n 
= 4. The samples of a (,, ’^(x) taking negative values in I) are not used to calcu- 
late samples of U^ n '^(x), so that the reported estimates are for the random field 
U M (x) l(min^ e D {a {n 'S\x)} > 0). 

Properties of solutions U(x) and U (n ' k) (x ) are estimated from samples of these 
fields obtained by solving (9.53) with/(x) = 0 for 800 independent samples of a(x) and 
respectively. Figure 9.6 shows a sample of the Beta translation conductivity 
a{x) (top left panel) and samples of a^ 1 '^ (x) for several values of n. The skewness and 
kurtosis coefficients of a(x) are 0.8589 and 3.0952 ([23], Chap. 24). Their estimates 
obtained from 800 samples are 0.9546 and 3.4143. Estimates of skewness and kurtosis 
coefficients for a (n ' k} (x) based also on 800 independent samples of this field are 
0.0602 and 2.4714 for n = 4, and -0.0797 and 2.7855 for n = 576. While the 
density of a{x) is skewed, the density of a 1 ”-^ (x) is approximately symmetric about 
its mean. Figure 9.7 shows the covariance functions of a(x ) (top left panel) and 
a (n> (x) for several values of n. The covariance function of a 1 " 1 (x) coincides with that 
of (a/"T)(x) — ao)/£ and approaches the covariance function of a(x) as n increases, 
in agreement with observations in Example 9.7. Figure 9.8 shows the expectation of 
U(x) (top left panel) and samples of U^'^Hx) for n — 4, 16, 100, 256, and 576. The 
samples of U^^Hx) for these values of n nearly coincide with E\ U(x) ] . Figure 9.9 
shows the standard deviation of U(x) and the standard deviations of U u,J:) (x) for 
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n = 16 






H 



100 


<1 


Fig.9.6 A sample of a(x) (top left panel) and samples of for n = 4, 16, 100, 256, and 576 



n — 4, 16, 100, 256, and 576. The standard deviation of U (x ) is much smaller 
than that of U(x) for all values of n in the figure. The dependence of the spatial 
variation of the standard deviation of U (n '^(x) on n is consistent with the facts that 
the probability law of a (n ^Hx) is a function of n and the mapping a (n ’^ i->- U (n ’^ 
is nonlinear. 
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n = 4 





r 2 -5 -5 Ti 

n = 256 



T 2 -5 -5 Tl 
n = 576 




Fig. 9.7 Covariance functions of a(x ) (top left panel) and a (n ^ ( x ) for n = 4, 16, 100, 256, and 576 


The discrepancy between the solutions Uix) and U <n '^(x) can be bounded by 
(9.57) with \\a - a\\ L °°(Dxn) = \\a - \\l°°(Dx£2) and \\f - f \\ L 2 (Dxn) = 0. 

The bound remains strictly positive for all value of n since \\a — a t ' n ' t ’' > \\L°°(DxQ) 
does not converge to 0 as n — > oo. O 

The stochastic partial differential equation in this example is also solved in Exam- 
ples 9.12, 9.13, 9.16, and 9.19 by the stochastic reduced order models, stochastic 
Galerkin, and stochastic collocation methods to illustrate the implementation of these 
methods and assess their relative accuracy and computational efficiency. 
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n = 4 





x 2 oo xi 


Fig. 9.8 Expectation of U(x) ( top left panel) and samples of U^ n '^\x) for n = 4, 16, 100, 256, and 
576 


n = 100 




We conclude this section with the observation that PDEs can be viewed as fil- 
ters whose solutions to random input are random fields. The spectral properties of 
these fields depend on the input probability law and the functional from of the PDE 
under consideration. These stochastic equations can be used to generate samples of 
Gaussian random fields with specified spectral properties. 

Example 9.9 Consider the SPDE Jf[U(x)] = V (x), x e R ' 1 , in (9.40) and sup- 
pose ££ = (A — a 2 ) 12 and V(x) is a weakly homogeneous random field with 
mean 0 and spectral density .r(v) = l(v e R), where a > 0 is a constant, p > 1 
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0 . 2 , 0 . 2 . 



Fig. 9.9 Standard deviations of U(x) (top left panel) and U (n,t: Hx) for n = 4, 16, 100, 256, and 576 


is an integer, A denotes the Laplace operator, the superscript (2 p) indicates that 
the operator in parenthesis is applied 2 p times, and R — [— v, v] d , 0 < v < oo. 
The solution of this equation to V ( x ) = e l v ' x is e' V JC /(||v|| 2 + a 2 ) 2p , so that U(x) 
has mean zero and spectral density s u (v) = l(v e f?)/(||v|| 2 + a 2 ) 2p ([24], Theo- 
rems 2.1.2 and 2.4.1, and Sect. 3.5.3 in this book). Figure 9.10 shows spectral den- 
sities of U(x) for d — 2, v = 5, a = 10, p = 1 (left panel), and p — 2 (right panel). 
Alternative spectral densities can be obtained for U(x) by using other functional forms 
for 2zf. O 
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x 10'" 



Fig. 9.10 Spectral densities of U(x ) for a = 10. p = 1 ( left panel), and p = 2 ( right panel) 


9.4.5 Stochastic Reduced Order Model Method 

We construct approximations U ( x , t) for the solution of U{x, t) of (9.33) by repre- 
senting the random entries in the definition of this equation by simple random ele- 
ments, referred to as stochastic reduced order models (SROMs). Two SROM-based 
methods are presented for solving stochastic partial differential equations. The first 
method approximates the solution U by a SROM U derived from a SROM of the 
random entries in (9.33). The second method represents the dependence of U on 
the random entries of (9.33) by piecewise linear functions. The construction of this 
representation is less simple than that in the first method, but provides a more accu- 
rate approximation for U. We refer to the first and second methods as SROM- and 
extended stochastic reduced order model (ESROM)-based solutions. The implemen- 
tation of the first method is discussed in Sect. 9.4.6. The method is applied to solve 
initial and boundary value problems (Sects. 9.4.6. 1 and 9.4. 6. 2). The implementation 
of the second method is discussed in Sect. 9.4.7. An example is used to illustrate the 
application of the method and assess its accuracy. 


9.4.6 Stochastic Reduced Order Models 

We construct a SROM U{x,t) for the solution U(x, t) of (9.33) and use it to find 
properties of U(x, t ) approximately. The construction involves three steps. First, a 
SROM X is developed for the random entries in (9.33) that are collected in a random 
element X. Second, a SROM U(x, t) is constructed for U(x, t) from the solutions of 
(9.33) with X replaced by the samples of X and the probabilities of these samples. 
Third, properties of U(x, t) are approximated by those of U (x ,t). 

Let X be a random element defined on a probability space (Q , & , P ) with values 
in a metric space (S, d), where d denotes a metric on S. For example, X is a real-valued 
random variable, a random vector, or a real- valued random function with continuous 
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samples defined on [0, 1 ] if S is the real line R, the Euclidean space W 1 , or the space 
of real-valued continuous function C[0, 1] defined on [0, 1]. 

A SROM X for X is a simple random element with m distinct outcomes x = 
(x (1 \ . . . , j£ (m) ) occurring with probabilities p = {pi, . . . , p m ), that is, P(X 
I ' = pt, where pk > 0, k = 1, . . . , m, and X*=i Pk = 1- Any set x of distinct 

points in S can be used for the range of X. However, it is unlikely that the probability 
law of X corresponding to an arbitrary selection of x and p will be similar to that of X. 
We are interested in SROMs X with size m that are optimal in some sense (Sect. A. 3). 
We denote the samples of any random element W by { vv ■ ' ^ } or { vv/ } . Similarly, the 
samples of a SROM W for W are denoted by {w <k> } or { w* } . 

Suppose the size m and the range (x ( l ; , . . . , x (ln} ) ofaSROM X have been selected 
and that our objective is to find optimal values for p = (pi , . . . , p m ). The model size 
m is essentially determined by the computational effort required to solve m deter- 
ministic versions of (9.33) obtained by setting X equal to samples of X. The discrep- 
ancy between X and X can be measured by, for example, the error X^>t a q e q (P)> 
where a q > 0 are weighting factors and e q ( p) measure differences between vari- 
ous properties of X and X . For example, if X is a real-valued random variable with 
finite moments p r = E[X r ], r — 1,2 and distribution F, then we may set 
e\(p,x) = max A . |F(.r) — F(x)\ or e\ (p,x) = f (F(x) — F(x))~ w(x)dx and 
ej (p, x) = = i w r {^r ~ Pr ) 2 , where w{x) > 0 is an integrable function, w,- > 0 

are weighting factors, jl r — X”Li (x (k) Y Pk, and F(x) = H*® < x)Pk- 

The defining parameters x = (x (1) , . . . , and p = (pi, , p m ) for a SROM 

X with size m are such that they minimize the objective function 


e{p,x) = ^u q e q (p,x) 
q> 1 


(9.67) 


under the constraints pq >0, k = 1 , ... ,m, and Xa"=i Pk = 1- 

We present an algorithm for finding optimal values of p for specified samples 
x — (x (1 \ . . . , x {m) ) of X, that is, the vector p minimizing the objective function 


e (P) = ^ J a q e q(P) 
q> 1 


(9.68) 


under the same constraint as in (9.67), where the errors e q ( p) are equal to e q (p, x) for 
a specified sample x. Suppose n set sets of m independent samples of X have been gen- 
erated and let p°PW denote the optimal probability vector for set 
j = 1, . . . , n se t, that is, the vector minimizing the objective function for this set. 
The SROM X corresponds to the set of m samples that has the minimum optimal 
error, that is, X has the range of set jo e {1,2,..., n se t} and probability vector p°PWo 
if e(p opt ’ J0 ) < e(p opt ' J ), j — 1, . . . , n se t- Alternative algorithms for constructing 
SROMs can be found in [19]. 

Suppose a SROM X with defining parameters (x <k> , pk), k = I .... , m . has 
been selected for the random elements in (9.33). Let w®(x, t), k = 1, . . . , m. 
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be solutions of (9.33) for A set equal to the samples x® , k — of X . 

The solutions (w®(x, t ), . . . , u (m \x, t )) and the probabilities (p\, . . . , p m ) of the 
samples of X define a SROM U (x, t) for U(x, t). Any deterministic solver can be 
used to find ® (x , f ) , k = 1 , ,m. Properties of U (x , t) can be calculated simply. 
For example, the marginal moments of order r > 1 , the marginal distribution, and 
the correlation function of U(x, t) are 


m 

E[U(x,t) r ] = Y J {» (k \x’t)Y Pk, 
k= 1 
m 

P(U (x, r) < £) = ^ \(u (k) (x, t ) < §) p k , and 
k= 1 
m 

E[U(x, s ) U(y, t)] = s) u il) (y, t) p k , (9.69) 

k= 1 

and can be used to approximate properties of U(x, t). 

The objective function of the optimization algorithm in (9.68) quantifies differ- 
ences between global properties of X and X so that these random elements may or may 
not be defined on the same probability space. Similarly, comparisons between global 
properties of the solutions U(x,t) and U(x,t) does not require to specify whether X 
and X are defined on the same probability space. On the other hand, the construction 
of bounds on the error U (x , t) — U (x , t)\ requires to specify probability spaces for 
X and X. Since the range (x <k K . . . , x (m> ) of X consists of m samples of X, we view 
A as a measurable image of X. 

We construct an approximate mapping X \-+ X = h(X) under the assumption 
that Acan be represented satisfactorily by a large number n of independent samples. In 
this setting, the construction of h is equivalent to that of a partition , k = 1 , ,m) 

of (jc^, . . . , x (n) ) such that all in ( iok are mapped into x® and p k — n k /n for all 
k = l, ... ,m, where n k is the cardinality of The partition {^kj of (x (1 \ . . . , x^ n) ) 
can be constructed in two steps. First, construct a preliminary partition of 
(x (1 \ . . . , x^) by assigning a sample x ^ to fak if it is closer to x (k> than any 
other ^ k, that is, is assigned to ^ if d(x^ l \ ic®) < d(x^ l \ x^), l^k, 
where cl is a metric in the image space of A. If d(x f, \ x®) = d(x (, \ x^) for a pair 
(k, l), jr') is assigned to either or % i . Generally, the cardinalities n', of do 
not satisfy the condition p k — n' k / n. Second, eliminate the members of the clusters 
with n' k /n > pk that are the farthest from x® until the reduced versions Wd 
of these clusters satisfy the condition n" k l n ~ p k , where n' k denotes the cardinality 
of i#'/' . The members extracted from the clusters ^ with n' k /n > p k are assigned 
to the clusters with n' k /n < pk based on their closeness to the nuclei x® of 
these clusters and the requirement n' k /n — pk- The algorithm delivers a partition 
{%, k = 1, . . . , m) of (x (1 >, . . . , x® ) such that the members x (,) of c € k are mapped 
into x® and the probability that A takes values in % is tik/n — p k . The resulting 
mapping is not unique. It depends on the particular sample (x®, . . . , x ®) used to 
characterize A, the model size m, and the metric use in S. 


9.4 Applied SPDEs: Arbitrary Uncertainty 


417 


9.4.6.1 Initial-Boundary Value Problems 


Suppose the spatial derivatives in (9.33) are approximated by finite differences, so 
that this equation becomes an ODE of the type in (9.27) with state vector F(?) whose 
coordinates are values of U(x, t) at the nodes of the finite difference mesh. If the 
random coefficients in the definition of 2z? are time invariant, Y(t) is the solution 
of Y (t) = A Y (t) + W(t) with initial state and driving noise W(t) inferred from 
U(x, 0) and V(x, t), where A is a random matrix describing ££ . Suppose X includes 
the random elements in the definition of ££ . Let Y <l, (t) and F® (?) be the solutions 
of Y^(t) — A (i) F (, l(?) + W(t) and F®(?) = A® Y^ k \t) + W(t), where matrices 
A (,) and A® are equal to A with X replaced by x ^ and it®, respectively. The 
processes F^(?) and F®(?) are approximations for the solutions U^ l \x,t) and 
L?® (x, t ) of 2z?[t/ (x, ?)] = V (x, ?) with X set equal to and x® . 

Theorem 9.5 Let Y(t) and F(?) be the solutions of Y (t) = AF(?) + W(t) and 
F(?) = A F(?) + W ( t ) as previously defined. Then 


E[\\Y(t)-Y(t)\\^]< l -f^ X ( eW [' 
k= l x d) € v k V Jo 


Sk,i(s) ds 


- z 


t [ e g k j ( S )ds\ 1 

rik z — 

4 ^ 

Jo / J 


(9.70) 


where X, max is the largest eigenvalue of (A 1 ^ + (A*'- ) ) , )/2, 8kj(t) =|| (A 1 -'-’ — 
A®) F®(?) ||, and pk ~ n^/n. 

Proof Since U (, f x, ?) and U ® (x, t ) are driven by the same input, we have (7.115) 


|| F®(?) - F (i) (?) || < e Xi ’ mm r [ e ~ ki -™* s 8 kJ (s)ds, i = l,...,n. 

Jo 

(9.71) 

If A (i ) = A®, then Sk,i(s) = 0 at all times, and so is the bound on || F®(?) — 
F<%) || . if Ad) ^ a®, the bound on || y® (?) - F w (?) || can be calculated from 
(9.71). It depends on F® (t ) over the time interval of interest, the real-valued function 
8k, i ( t ), and the largest eigenvalues X IMmx of (A (!) + (A l!> )') /2, but does not depend 
on Y kl '(t), i = 1 The inequalities in (9.71), the representation of X by n 
independent samples, and its SROM X yield the bound in (9.70). A 

Arguments yielding the bound in (9.70) can be extended directly to bound the 
discrepancy between the solutions Y(t) and Y (?) if A is characterized by its probability 
law rather than n independent samples. Let Y(t;a) denote the solution of F (?) = 
A F(?) + W(t ) with A set equal to A(x), x e I = X(L!), and F®(?) as previously. 
The bound in (9.71) gives 
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Y (k] {t) - Y(t; X ) II < e Xm ™ (x)r [ e ~ k ™ Ms S k (s; x)ds, (9.72) 

Jo 


where A. max (x) denotes the largest eigenvalue of ( A(x ) + A(x)')/2 and S k(s\ x) = 
|| (A(x) - A {k) ) Y (k) (t) || . Let {I k , k = 1,... , m } be a partition of the range I 
of X such that p k ~ fj f(x)dx, where /(x) denotes the density of X. Then, the 
expectation of || 7 (f) — Y (f) \\ q can be bounded by 


E[ || 7(f) -7(f) r ] < E^, 

■US 


w*)t / e -^x)s Sk(s . x)ds 
Jo ) 

Ama x(x)t J ds J f(x)dx. 

(9.73) 


Example 9.10 Let Y(t) be the solution of the stochastic ordinary differential equation 
7(f) = —A 7(f) + 1, t > 0, where d = 1, 7(0) = 0, and A is a shifted Gamma 
random variable with density f(z) = (z — C) r_1 ^ k e~^ z ~^/r(r), z — £ > 0 
depending on some constants 0. The discrepancy between two determin- 

istic solutions for A equal to a' and a" can be bounded by |7(f; a ') — Y (f; a") \ < 
e a '‘ 3(j; a', a”) ds where S(t; a', a") = \a! -a"\ 1 7(f; a") I by (9.72). Since 

the exact solution is 7(f) = (l — exp(— A f))/A, then |7(f; a') — 7(f; a") \ can be 
calculated exactly. 

Let (fli, . . . , a m ) and ( p \ , . . . , p m ) be the defining parameters of a SROM A 
for A, and let (/j, . . . , I m ) be a partition of the range [£, oo) of A, d k e I k , and 
Pk = P(A e I k ), k = 1, . . . , m. Since A max is equal to sample values of A, S k j(t) = 

| ai — d k | |7*(f)| and 8 (t; a) = \a — a k \ \Y k (t)\, the bound in (9.73) gives 


£[|7(f)-7(f)|«] 


US 


e fls ^(j;a)^ f A (a)da. (9.74) 


Numerical results in Fig. 9.11 are for cj = 1, £ = 1, r = 2, f = 3, w = 4, 7j = 
[0,0.372), / 2 = [0.372, 1.020), / 3 = [1.020, 1.780), h = [1.780, oo), and a 
SROM A of A with defining parameters (d\, . . . , ( 14 ) = (0.1640, 0.7253, 1.161, 
2.4418) and (p l ,...,p 4 ) = (0.3068, 0.5028, 0.160, 0.0304). The dotted and solid 
lines in Fig. 9.11 are the expectation Zs [| 7(f) — 7(f) |] and the bound on this expec- 
tation in (9.74) for 7 (f ) corresponding to a SROM X for X with m — 4 samples. O 


Example 9.11 Let U(x, f), x e (0, /), f > 0, be the solution of the SPDE in (9.60) 
with boundary conditions f/(0, f) = U (l, t) = 0 a.s. The coordinates of the vector 
7 (f) are values of U(x, f) at a finite number of coordinates x e [0, /]. We have seen 
that 7 (f ) satisfies an ordinary differential equations of the type in (9.27). 

The following numerical results are for / = 1, X = 10, a uniform distribution Tin 
the interval [a, b] — [2, 6], and a partition of [0, 1] in 1 1 equal intervals, so that 7(f) is an 
ten-dimensional vector if we account for the boundary conditions. Figure 9. 12 shows 
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Fig. 9.11 Exact expectation 
£[|y(f) — ?(f)l] ( dotted 
line ) and the bound on this 
expectation in (9.74) ( solid 
line) 



t 




Fig. 9.12 Monte Carlo estimates of £[||F(r) — Y (f)|| 9 ] (solid lines) and bounds on this expectation 
(dotted lines) for a SROM with m = 10, q = 1 ( left panel), and q = 2 ( right panel) 


with solid lines estimates of the expectation £[||F (t) — Y (Oil 9 ] based on 500 samples 
of Y(t) corresponding to 500 independent samples of E (x ) for q = 1 (left panel) 
and q = 2 (right panel) and a SROM Y (t) with m = 10 samples for q = 1 (left 
panel) and q = 2 (right panel). The left panel in Fig. 9.13 shows with solid and 
dotted heavy lines an estimate of E [ || Y (t) || ] obtained from 500 independent samples 
of Y (t) and the expectation /7[j| Y (t) ||] corresponding to a SROM of E (x) with 
m — 10. The thin solid lines in the right panel of the figure are estimates of E [ || Y ( t ) ||] 
calculated from 50 sets of ten independent samples of E(x). In contrast to the 
SROM-based approximation of E [ || Y (t) || ] , approximations of this expectation based 
on independent samples of size ten exhibit significant sample-to-sample variation. 
Results similar to those in Fig. 9.13 are in Fig. 9.14 for the second moments of 
|| K (?) || and ||f(t)||. The estimates of Zs[|| F(t)|| 2 ] in the right panel corresponding 
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Fig. 9.13 Estimate of E[||F(f)||] obtained from 500 samples ( solid heavy line, left panel), an 
approximation of [ || 3^ f?) || ] based on a SROM of E(x) with m =10 ( dotted heavy line, left panel), 
and estimates of £[||F(/)||] obtained from 50 sets of 10 independent samples of E(x) (continuous 
thin lines, right panel) 
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Fig. 9.14 Estimate of £’[||F(t')|| 2 ] obtained from 500 samples (solid heavy line, left panel), an 
approximation of £[||F(f)|| 2 ] based on a SROM of E(x) with m = 10 (dotted heavy line, left 
panel), and estimates of £[||F(f)|| 2 ] obtained from 50 sets of 10 independent samples of £(x) 
(continuous thin lines, right panel) 


to 50 sets of ten independent samples E (x ) exhibit notable variation from one set to 
another. O 


9.4.6.2 Boundary Value Problems 

Consider the stochastic boundary value problem in (9.53) and suppose that the 
random entries in this equation are collected in a random element X defined on 
a probability space (£2, f? , P). Let X he a SROM for X with defining parameters 
& k \p k ), k = L... , m . Suppose X can be characterized accurately by a large num- 
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ber n m of independent, equally likely samples \x (l> } and that these samples are 
clustered in subsets ^ of (x®, . . . , x®) with cardinality n k such that pk ~ rik/n. 

The weak solutions U(x) and U (x) of (9.53) corresponding to the representation of 
X\yy n independent, equally likely samples {.r ■ ' } and its SROM X with parameters 
(x®, pk), k = 1, . . . , m, satisfy the equations £3(U, W) = (/, W) L 2 (Dx q^ and 
J(f/, W) = (/, W) L 2 (DxQ) , where 


1 " f 

&(U, W) = ~y' / a ( '\x)XU 
n i= i Jd 

m r i 

= 2> x: Z 


(x) • Vff (l| (x) dx, 


a (i) (x) Vf/ u; (x) ■ Vff w (x) c/x 

I 

/ fi®(x) VC/®(x) ■ VW (/) (x)dx 


jd)< 


AO i 


x Metf* ‘ 


<%{U,W)= - V W [- V 
« rA L«t- 




(9.75) 


r i r 

(f,W) L 2 {Dxi2) = Y J p k - X / f (,) MW (l) (x)dx 




r i r 

<f,w) LHDxa) = '£ Pk - X / f (k) ww (l \x)dx 




(9.76) 


C/ ( '\ _/■(') , a6) ; and fV ( ') correspond to x , i = 1 and t/®, /®, a®, 
and /® correspond to x® , k = l, ... ,m. 

The functionals in (9.75) and (9.76) satisfy the conditions of Theorem 9.4 so that 
the bound in (9.57) holds with 

Hf / -^ll^/(Z)t2) = Zw| — Z / (V' (a) (x) 2 + W (; ®(x)- W (, '®(x))dxl 
*=1 xfOg-r/- 0 J 

l|a -«llz,°°(Dxfi) = max max sup |a (,) (x) - 5®(x)|, 

1 <k<m x ( ‘>€% x€ f) 
m 

11*711 WXD.Q) = Z. Pft 
*=1 

n/-/iii2(Dxt2) = Z^[i ^ / (/ (0 w-/ w w) 2 ^4 (9 - 77) 

*=1 L x®e*jt 

where V ( '®(x) = U (l \x) — E/®(x), and / ( 0(x) and /®(x) correspond to x® 
andx®, respectively. 

Example 9.12 Suppose the conductivity 17 (x) of a rectangular two-dimensional 
specimen Z) = (0, /i) x (0, 12 ) can be modeled by a homogeneous random field. The 


[ (t/®(x) 2 + Vt/®(x) ■ VC/® (x)) dx, 

Jd 
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Fig.9.15 Two samples of a SROM £(x) with m = 10 for S(x) with range [a = 1, fi = 8] 


potential U satisfies the stochastic boundary value problem — V • ( £ (x ) V U (x , tw))=0 
in D with the boundary conditions U (0, * 2 ) = 0, U (l\ , X2) — 1 , 3 U(xi , X 2 ) /dx2 = 0 
for X 2 = 0 and X 2 = h, P - a s. Our objective is to find properties of the effective 
conductivity Zeff approximately by SROMs. We have 

1 f dU(x,co) 

^effM = , / Six, CO) : dx, (9.78) 

h Jd 3*t 

where U(x,a > ) denotes the potential in the specimen corresponding to a sample 
E(x,a>) of £(x). 

It is assumed that £ (x) is the Beta translation field in (9.62), so that it takes values 
in [a, jS] , 0 < a < j5 < 00 , irrespective of the spectral density of its Gaussian image 
G(x). The marginal moment of order r of £ (x ) is 


M(r) = E[£(xY] = 


r 

z 


r! 

j! (r — j)! 


(/j - a) J 


B(p + s,q) r 

a 

B(p, q) 


(9.79) 


Two SROMs £ (x) with m = 10 and m = 20 samples have been constructed 
for £(x) taking values in [a = 1,/? = 8] and [a = 1,/J = 20], respectively. 
Consider first the case [a = 1, /f = 8] and m — 10. Figure 9.15 shows two samples 
of a SROM £{x) for £(x) recorded as sample o\(x) (left panel) and sample &i(x) 
(right panel) in our calculations. The probabilities of these samples of £(x) are 
pi = 0.1509 and p^, = 0.0701. The accuracy of £{x) is assessed by comparing 
spatial averages jl(r) = Xfc=i Pk(l/ V D) / D ak(x) r dx of its marginal moments of 
order r with corresponding moments of £ (x). The first six moments of £ (x) and 
£(x ) are 

p,(r) = [0.0028 0.0088 0.0309 0.1191 0.4939 2.1783] x 10 3 
p{r) = [0.0028 0.0086 0.0299 0.1146 0.4741 2.0910] x 10 3 . 


The largest error for the moments of £{x) based on a SROM X with m — 10 samples 
is max,. = i 6(100(/x(r) — /u(r)) / p{r)} = 4.18%. Samples of 27 e ff and SROMs Z e ff 
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Fig. 9.16 Two samples of a SROM Z'(x) with m = 20 for E(x) with range [a = 1 , fi = 20] 


of this random variable can be calculated from (9.78), samples 17 (x, co) of E (x), and 
potentials U (x, co) corresponding to these samples. The first six moments Aeff AO of 
the SROM-based approximation 27 e ff for 27 e ff and estimates Aeff(r) of the first six 
moments £[I7' ff ] of 27 e ff obtained from n — 1000 independent samples are 

AeffAO = [2-6924 7.2749 19.7232 53.6437 146.3420 400.3602] 

Aeff A0 = [2.6478 7.0451 18.8369 50.6113 136.6463 370.7305] 

for r — 1 , . . . , 6. The errors of AeffAO with respect to AeffM are 1.68%, 3.27%, 
4.71%, 5.99%, 7.09%, and 7.99% for r = 1,2, 3, 4, 5, and 6, respectively. 

Suppose now that E (x) takes values in the range [a = \ . fi = 20]. Since E (x ) 
has much larger variability in this case, we construct for E (x) a SROM E (x) with 
m = 20. Figure 9.16 shows two samples of E recorded in our calculations as sample 
<73 (x) (left panel) and sample a\-j (x) (right panel). The probabilities of these samples 
of E are pj = 0.0662 and pn = 0.0390. The first six marginal moments p (r) of 
E(x) and of the spatial averages fii(r ) of the moments of E(x) are 

fii(r) = [0.0001 0.0004 0.0033 0.0309 0.3124 3.3913] x 10 5 
pc(r) = [0.0001 0.0004 0.0035 0.0325 0.3283 3.5515] x 10 5 

showing that the first six moments of E (x) are in error by —2.6298%, —4. 1 1 14%, 
-4.8654%, -5.1493%, -5.0822%, and -4.7237%. Since statistics of effective 
conductivity H'eff are not available analytically, they have been estimated from sam- 
ples of this random variable. The first six moments fii t n(r), r = 1, . . . , 6, of E c n 
and corresponding estimates Aeff(r) of £[1^], r = 1, . . . , 6, obtained from 1000 
independent samples are 

Aeff (r) = [0.0005 0.0028 0.0148 0.0800 0.4356 2.3942] x 10 4 
Aeff (r) = [0.0005 0.0027 0.0144 0.0768 0.4109 2.2097] x 10 4 . 

The errors of Aeff(r) relative to Aeff A") are —0.60%, —1.45%, —2.57%, —3.98%, 
—5.69%, and —5.69%, —7.70% for r = 1 6. Figure 9.17 shows estimates of 
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Fig. 9.17 Estimates of first 
six moments of Z e ff based 
on five sets of 20 samples 
each of this random variable 



Moment order 


£’[X'g ff ], r = l, ... ,6, obtained from five sets of 20 independent samples of this 
random variable selected at random by Monte Carlo simulation. In contrast to SROM- 
based approximate moments for Z'eff, Monte Carlo estimates can have very large 
errors and depend strongly on the particular set of 20 samples used for calculations. 

The discrepancy between the SROM-based approximation lj e ff and effective con- 
ductivity X'gff can be bounded by 

- ■S’effl < l-S’eff — (9.80) 

where i7 e ff denotes a Monte Carlo estimator for 27 e ff corresponding to n independent 
samples of E(x) and U(x). Theorem 9.4 can be modified to obtain an upper bound 
on the error | X'eff — -Left I of the SROM-based approximation for X'eff with respect to 
Monte Carlo estimates based on n independent samples of E(x). The second term 
on the right side of (9.80) can be made as small as desired by increasing n. <> 

Example 9.13 Let U(x) be the solution of the stochastic partial differential equa- 
tion (9.53) in Example 9.8 with u(x) in (9.62), (9.63), and (9.64). Let a(x) be a 
SROM for a(x) with samples (a \ (x), . . . , a m (x )) and probabilities (pi, . . . , p m ). 
Denote by (u\ (x), . . . , u m (x)) solutions of (9.53) with a(x) set equal to the samples 
(ill (x), . . . , a m (x )) of a(x), so that U (x) with samples (u\(x), ... ,u m (x)) of prob- 
abilities (pi, . . . , p m ) is a SROM for U(x). Properties of U (x) can be obtained by 
elementary calculations. For example, E[U (x) r ] = Xfc=i Pk^k( x Y is the moment 
of order r = 1, 2, . . . of U(x). 

Numerical results are for the parameter values in Example 9.8 and two SROMs for 
a(x). The first SROM has m = 25. Figure 9.18 shows the probabilities {p^} of these 
samples. The top panels in Fig. 9.19 show the spatial variation of the approximate 
means and standard deviations of U (x) corresponding to a SROM a(x) of a(x) with 
m = 25 samples. The bottom panels in the figure gives absolute values of the errors 
of these moments relative to the Monte Carlo estimates of these moments in Example 
9.8, that are based on 800 independent samples. The means and standard deviations 
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Fig. 9.18 Probabilities 
{pk}, k = 1, . . . , m, for a 
SROM of a(x ) with m = 25 





Fig. 9.19 Mean and standard deviations of U(x ) for a SROM with m = 25 samples (top panels) 
and errors relative to Monte Carlo estimates (bottom panels) 


of the SROM U ( x ) are satisfactory, although they are based on m = 25 samples, 
so that the construction of U ( x ) involved 25 solutions of deterministic versions of 
(9.53). 

Similar plots are in Figs. 9.20 and 9.21, but they correspond to a SROM a(x) of 
a(x) with size m = 80. The probabilities { /?/. } of the samples of a(x) are plotted 
in Fig. 9.20. The spatial variation of the mean and standard deviation of U (x) is 
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Fig. 9.20 Probabilities 
{pk}, k = 1, . . . , m, for a 
SROM of a(x ) with m = 80 





Fig. 9.21 Mean and standard deviations of U(x) for a SROM with m = 80 samples (top panels) 
and errors relative to Monte Carlo estimates ( bottom panels) 


shown in the top panels of Fig. 9.21 . The absolute values of errors of the moments of 
U (x) relative to their Monte Carlo estimates, calculated in Example 9.8 and used in 
Fig. 9.19 as reference are in the bottom panels of the figure. The increase of model 
size from m = 25 to m = 80 results in superior SROM-based approximations. In 
addition to smaller errors, the spatial variation of the standard deviation of U (x) for 
m = 80 traces that of the reference Monte Carlo estimates. The improvement of 
the performance of U ( x ) with the model size is rather slow, possibly because the 
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SROMs developed here are suboptimal. They have been extracted from n set = 40 
sets of independent samples of a(x ) of size m = 25 and m — 80. O 


9.4.7 Extended Stochastic Reduced Order Models 

We have seen that the SROM-based method can be used to solve stochastic equations 
efficiently. The solution U of a stochastic equation is approximated by a SROM U 
with samples {uk 1 corresponding to the samples of a SROM for the random elements 
in the definition of this equation. In Example 9.13, the samples {iik(x)} of U (x ) 
are solutions of deterministic PDEs with a(x) replaced by the samples \cik(x)} of a 
SROM a(x ) of it. A limitation of the approximate solution U (x) is that it does not 
explore the sensitivity of U (x) with respect to changes in \ak(x)}. It is solely based 
on the expressions of U (x) corresponding to \ak(x)). 

The extended version of the SROM-based method, referred to as the ESROM- 
based method, attempts to overcome this limitation. The method is discussed in 
Sect. A. 4 for the case in which the random elements in the definition of a stochastic 
equation can be described by an R" -valued random variable C defined on a prob- 
ability space (f2, T , P ). The implementation of ESROM-based solutions involves 
the following three steps. First, a SROM C with samples {q}, k = I ..... m . is 
developed for C. Second, deterministic solvers are used to find solutions {iik} cor- 
responding to the samples {£>} of C and gradients \Vilk 1 of U with respect to the 
coordinates of C at {q.}. The solution U is approximated by hyperplanes tangent to 
U at {q}, that is, 


U L (; C) = Y, [«*(■) + VMjt(-) ■ (C - c*)]l(C G r k ), (9.81) 

k= 1 

where {/\} are the cells of a Voronoi tessellation with centers {q.} constructed in the 
range r = C(f2) of C. If the mappings C i-v U ,Ul are measurable, then U and 
Ul are random elements on (Q, T . P). Third, properties of U are approximated by 
those of Ul in (9.81), that can be estimated efficiently from samples of C since the 
functional form of Ul is available. The representation of U in (9.81) accounts not 
only for the expressions of U at {q- j, as for SROM-based solutions, but also for the 
rate of change of U with the coordinates of C in vicinities of {c*}. 

The following example uses ESROMs to solve a version of the stochastic differ- 
ential equation (9.53). Since a (x) in (9.53), as any other random field, consists of an 
uncountable number of random variables, it is represented approximately by a linear 
parametric model with dependent coefficients (Sect. 6. 3. 3. 2). 

Example 9.14 Let U ( x ) be the solution of the stochastic partial differential equation 
(9.53) with a(x) in (9.62), (9.63), and (9.64). Let 
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n* n 

a(x, C)= X A i,jTi( x l)Tj( x 2) = ' S y.C r <Pr( x ), X — (. Xl,X 2 ) e T>, (9.82) 

i,j=0 r = 1 

be a linear parametric model with dependent coefficients (Sect. 6. 3. 3. 2) for a(x), 
where {!/(•)} are modified Chebyshev polynomials defined in [0,1] by the recur- 
rence formula Tj+ 1 (£) = 2(2£ — 1 ) Tj (| ) — Tj- i(f), j = 1,2, ... , with 7o(£) = 
1 and 7i(£) = 2£ — 1, the functions {(p r (x)} are products of Chebyshev polynomials 
in X] and X 2 , and C is an E" -valued random variable that is defined on a probability 
space (Q, T , P ) and collects the random coefficients {Ajj}. Under the approxima- 
tion a(x) — a(x, C), the stochastic differential equation (9.53) defines a mapping 
C i — > U. ESROM-based solutions constitute approximations for this mapping. 

As previously mentioned, the first step of the ESROM method is the construction 
of SROMs for the uncertain parameters in the definition of a stochastic equation, that 
is, the random vector C for our case. Since the probability law of C is not known, 
samples of this vector are used to construct SROMs C for C. We calculate samples 
of C by minimizing the objective function in (6.48) that quantifies the discrepancy 
between a(x) and ci{x, C). The analysis is based on n s — 1000 independent samples 
of C. 

Numerical results are reported for l\ — 12, h — 6, a = 10, f J > = 100, p = 1, 
q = 3, p = 0.7, and stochastic dimension n — 25. Using an algorithm in [34], 
n s = 1000 independent, equally likely samples [c,], i = 1, ...,n s , of C have 
been obtained by minimizing the discrepancy between samples of a(x) and a(x, C). 
It is assumed that C is completely described by the samples { c/ } , i = 1 , ,n s . 
This characterization of C is used to construct SROMs C for C and Monte Carlo 
estimates for properties of U. The construction of Monte Carlo estimates requires 
n s = 1000 solutions of distinct deterministic versions of (9.53) corresponding to 
C = Ci , i = 1 , . . . , n s . 

The construction of the piecewise linear representation Ul of U requires to find 
the deterministic functions [w* } and [Vi/r], k = 1, ... ,m, that is, the functions 
U for m points [rp ] in the range F — C(Q) of C and the gradients of U with 
respect to the coordinates of C at ( q } . The expansion points for 11/ are the samples 

{qJ, k = 1 m, of a SROM C for C that are extracted from the set { c, } , 

i = 1, . . . , n s , by an optimization algorithm [19]. The functions {ut\ are solutions 
of deterministic versions of (9. 53) with [c* } in place of C. The coordinates V r (x, C ) = 
31/ (x, C)/dC r , r = 1, . . . , n, of the gradients of U at \ck) are obtained from the 
differential equations 

V ■ (5(s, C)VV,-(x, C)) = -V ■ | fl( ^ C) Vl/(r, C)^, r=l,...,n, (9.83) 

with dk, k = 1, . . . , m, in place of C. The latter equation is derived from (9.53) by 
differentiation with respect to the coordinates C r , r = 1, . . . , n, of C. Since (9.83) 
and (9.53) have the same differential operator, their solutions for n + 1 distinct right 
sides involves a single inversion of their common operator, so that only m distinct 
deterministic operators need to be inverted. This observation increases significantly 
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Fig. 9.22 An approximate measure for cell size corresponding to SROMs with m = 15 (left panel) 
and m = 25 (right panel) 



Fig. 9.23 Measures g i (k) and g 2 (k) of local gradients V u x (x) for m = 15 (left panel) and in = 25 
(right panel) 


the efficiency of the finite element algorithm used for calculations. Since the mapping 
C Ul is known and has a simple functional form, statistics of U / can be obtained 
from samples of C with a minimum computational effort. The Euclidean distance in 
R" is used to assign the samples {c; } of C to the Voronoi cells { IR}. For example, c; 
is assigned to T^ if ||c ; - — c* || < ||c; — c/|| for all / ^ k. If ||c; — c* || = ||c ; - — c/||, c, 
can be assigned to either 1R. or IR. Note that it is not necessary to construct Voronoi 
tessellations explicitly for the range T = C(Q) of C to implement ESROM-based 
solutions. 

The approximate measure max^ gr^ ||c,- — (R| for the size of the Voronoi cells is 
shown in the left and right panels of Fig. 9.22 for SROMs with m = 15 and m = 25, 
respectively. As expected, the cells for m = 25 are smaller on average than those for 
m = 15. The magnitude of the gradients of U (x , C ) at { <R } is quantified by 


g\(k) = max max \duk(x)/dC r \ and 

l<r<n xeD 

g2 (k) = ((l/vol(D)) JjVu k (x)\\ 2 dx^ ' 
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Fig. 9.24 Approximations of E[U(x, C )] based on a SROM with m = 15 ( left panel), a SROM 
with m = 25 (middle panel), and Monte Carlo simulation ( right panel) 


0.2 0.2 0.2. 



Fig. 9.25 Approximations of Std[t7 (x, C)] based on a SROM with m = 15 ( left panel), a SROM 
with m = 25 (middle panel), and Monte Carlo simulation (right panel) 


and shown in Fig. 9.23 for m — 15 (left panel) and m = 25 (right panel). Since the 
cells {r k } have similar sizes and the gradients {Vw*} have similar magnitudes, we 
conclude that there is no need to refine the partition { r* } of T. 

Figure 9.24 shows approximations for E[U (x, C)] based on a SROM with m = 15 
(left panel), a SROM with in = 25 (middle panel), and Monte Carlo simulation 
(right panel). The Monte Carlo estimates are based on 1000 samples of U(x, C ). 
Standard deviations of U(x, C) are in Fig. 9.25. The plots in the left, middle, and 
right panels are based on a SROM with m = 15, a SROM with m = 25, and 
Monte Carlo simulation using 1000 samples of U (x, C ). The largest discrepancy 
between the means of U(x, C) by SROMs relative to Monte Carlo estimates are 
0.012 for m = 15 and 5.5 x 10 -3 for m = 25. Corresponding discrepancies 
for standard deviations are 0.095 for m = 15 and 4.8 x 10 -3 for m = 25. The 
increase of the model size from m = 15 to m —25 improves significantly the 
accuracy of the SROM-based approximations for E[U(x, C)] and Std[f/(x, C)]. 
The thin solid lines in Fig. 9.26 coincide and give a Monte Carlo estimate for the 
distribution Fq(u) = P(U(xq, C) < u) of U(x, C ) at xo = (/ 1 /2, h/'P) obtained 
from n s = 1000 independent samples of U (xo). The heavy dotted lines in the figures 
are approximations of Fq(u ) corresponding to ESROM-based solutions with m =15 
(left panel) and m = 25 (right panel). O 
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Fig.9.26 A Monte Carlo estimate for Fq(u) = P(U (xo, C) < u ), xo = (/ 1 / 2 , h/2), (think solid 
lines) and approximations of Fo(u) based on SROMs with m = 15 ( left panel) and m = 25 ( right 
panel) 


9.4.8 Stochastic Galerkin Method 

The Galerkin method has been used extensively in applications to solve a broad range 
of stochastic problems in physics and engineering [25]. We review essentials of the 
Galerkin method and evaluate its performance. The stochastic elliptic boundary value 
problem (9.53) is the model problem considered in our discussion. 

Let a(x) and fix), x e D, in (9.53) be real-valued random fields defined on a 
probability space (C2 , .fi , P ). Under some conditions stated in Sect. 9.4. 8.3, fix) 
and fix) admit Karhunen-Loeve (KL) representations and (9.53) has a unique weak 
solution. The weak solution U(x, co) of this equation satisfies (9.54) and belongs to 
W ( D , Q) defined in (9.55). Generally, numerical methods need to be employed to 
solve (9.53), since this equation can be solved analytically only in special cases. 

The implementation of numerical methods requires the discretization of both the 
probability space and the physical space. Methods for discretizing these spaces are 
discussed in the following two sections. Subsequent sections deal with the existence 
and uniqueness of Galerkin solutions and the construction of parametric models for 
the random elements in (9.53). Numerical examples conclude our discussion on the 
stochastic Galerkin method. 

9.4.8.1 Probability Space Discretization 

The discretization of probability space involves two steps. First, the random fields 
in the definition of a stochastic problem are approximated by parametric models, 
that is, deterministic functions of space and/or time arguments depending on a finite 
number of random variables. Second, the random variables in the definition of a 
parametric model are approximated by their projections on subspaces of the space 
random variables with finite variance. 

First step. Random functions, that is, uncountable families of random variables, 
are approximated by parametric random functions. We consider parametric models 
given by truncated KL representations and by parametric translation functions. Para- 
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metric models can also be constructed from a sampling theorem for random functions 
whose spectral densities have bounded support ([26] and Sect. A. 1.3 in this book). 

Consider first KL-based parametric models. Suppose that the random fields a(x) 
and fix) in (9.53) are defined on a probability space (Q , fp , P) and that they admit 
KL representations. Truncated versions of these representations have the form 


m 



m 


fix) - f{x, Z) = ^ f k (x)Z k , X e D, 


(9.84) 


k= 1 


where Z = (Z\, . . . , Z m ) is an R m -valued random variable defined on (Q , ,'Z , P) 
with uncorrelated but, generally, dependent coordinates. The deterministic functions 
{a k (jc)} and {f k (x)} result from the KL expansions of these fields. We refer to (9.84) as 
linear parametric models since they are finite sums of known deterministic functions 
of x e D with random coefficients. If a(x) andfix) are dependent, their KL expansion 
needs to be constructed jointly. If a(x) and fix) are independent, their parametric 
models will depend on disjoint sets of coordinates of Z that are independent. For 
simplicity, we use the formulas in (9.84) irrespective of whether a(x ) and fix) are or 
are not independent. 

The parametric models a(x, Z) and fix, Z) in (9.84) are essential for the solu- 
tion of stochastic problems since they depend on m < oo of random variables. The 
KL-based linear parametric models are partially specified, in the sense that only 
the first two moments of Z and, therefore, aix, Z) and fix, Z) are known. This char- 
acterization is insufficient to generate samples of aix, Z) and fix, Z) and establish 
conditions for the existence and uniqueness for solutions of stochastic problem. It is 
common to augment the partial characterization of Zby assuming that its coordinates 
are independent and follow specified distributions such that (9.53) with aix, Z) in 
place of aix) is elliptic almost surely [16, 20, 25, 27-29, 30-33]. The selection of the 
probability law for Z based solely on mathematical considerations results in para- 
metric models for aix) andfix) that may be of limited practical value (Sect. 6.3.3. 1). 
We note that linear parametric models for aix) and/(x) can be constructed such that 
they satisfy both mathematical and physical constraints ([34] and Sect. 6. 3. 3. 2 in this 
book). 

The following example shows an additional limitation of KL representations. 
It relates to the facts that random functions with identical second moment properties 
may have very different sample properties. These differences may be amplified by 
the solutions of SPDEs since they constitute nonlinear mappings of their random 
coefficients. 

Example 9.15 The solution of the SPDErf (a ix)U'ix)) = 0, x e (0, /), with bound- 
ary conditions £/( 0) = 0 and t/(l) = 1 is U (x) = f 0 ' K(u)du/ fg K)u)du, where 
a)x) > 0 a.s. denotes a random field and K ix ) = 1 /aix) is assumed to have finite 
variance. We consider two models for K(x), the random fields K \ (x ) and /'LL). 
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Fig. 9.27 Estimates of 
scaled covariance functions 
for K\(x) ( solid line) and 
K 2 (x ) (dotted line) for 
a = 0. 1 . f) = 6, and X = 5 



x 



i-lag 



Fig. 9.28 Mean and standard deviations of U(x) for K (x) = K\ (x) ( solid lines) and K (x) = K 2 (x) 
(dotted lines ) for a = 0.1, b = 6, and X = 5 


The first model is the translation held K\(x) = a + (fi — a)<P( G (x ) ) , where 

0 < a < < oo and G(x) is a homogeneous Gaussian held with mean 0, variance 

1, and covariance function E[G(x)G(y)] = exp(— k\x — y|), X > 0. The mean and 
variance of Ki(x) are = (a + fi)/2 and Var[^i(.r)] = (j3 — a) 2 /12. 

The covariance function of K i (x ) can be calculated from its dehnition and the 
probability law of G(x). The covariance of K \ (x) scaled by its variance is approxi- 
mately equal to that of G(x), and we assume it coincides with the covariance function 
of G (x). 

The second model is a homogeneous random held Kiix) whose samples are 
piecewise constant with jumps at Poisson points spaced on average at 1/A. The 
marginal density of Kjix) is //(z) = 1 (/x — tj — s < z < /x — r] + e)/(4e) + 

1 (M + 11 — s<z</x + r) + e)/(4e), 0 < e < »/, so that its mean and variance 
are ( a + fi)/2 and e 2 /3 + i) 2 . Under the conditions Var[/G(x)] = VarfX^x)] and 
i) 2 = 0.99Var[/fi (x)], the random helds K \ (x ) and K^Ax ) have the same second 
moment properties (Exercise 9.13), so that they admit the same KL representation. 

Numerical results in the following hgures are for a = 0.1, /J = 6, and X = 5. 
Figure 9.27 shows with solid and dotted lines estimates of the scaled covariance 
functions of K \ (x) and Ki(x). Figure 9.28 shows with solid and dotted lines the 
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means and standard deviations of the solutions Ui(x) corresponding to K (x) = 
Kj(x), i = 1,2. The two solutions have similar expectations but their standard 
deviations differ significantly. In summary, the random fields K \ (x) and K2 (x) are 
equal in the second moment sense, so that they admit the same KL representation 
implying that the resulting stochastic Galerkin solutions for these random coefficients 
coincide. Yet, the standard deviations of U\ (x) and U2 (x) corresponding to these 
random coefficients differ significantly. O 

Consider now parametric translation models. Suppose that a(x) in (9.53) is a 
weakly homogeneous random field specified partially by its marginal distribution F 
and correlation function r a (y ) = E[a(x + y)a(x)]. The homogeneity assumption 
can be relaxed. Let 


a(x) ~ ar(x) = F 1 o <£(G(x)) = h(G(x)), x e D, (9.85) 


be a translation field, where G(x) is a stationary Gaussian field with mean 0, 
variance 1, and covariance function p(y) = F [ G (x + y)G(x)]. The correlation 
function of ajfx) is 


r aT (y) = E[a T (x + y)a T (x)] = E[h(G(x + y))/z(G(x))] 


(9.86) 



and depends on p(y) and the mapping in (9.85), where </;(■, ■; X) denotes the joint 
density of two correlated N(0 , 1 ) variables with correlation coefficient X . The marginal 
distribution of aj(x) is F irrespective of the covariance function of G(x), so that, if 
the support of F is a bounded interval [a, ft], 0 < 0 / < fi < 00 , the samples of 
aj(x ) are in [a, fi] with probability 1. On the other hand, there may not exist p(y) 
such that r aT {y) — r a (y) ([22], Sect. 3.1). Optimization algorithms can be used to 
select p(y) such that the discrepancy between r aT (y) and r a (y) is minimize in some 
sense (Sect. 3.8.2). 

Parametric translation models for a(t) are given by its parametric translation model 
in (9.85) with G (n) (x) in place of G(x), that is, the random field 


a ( "\x) = F~' o<f(G w (r)) = h{G (n \x)), x e D, (9.87) 


where G (n) (x), n = 1,2,... , is a sequence of homogeneous parametric Gaussian 
fields with mean £[G ■"^(x)] = 0, variance E[G (n \x) 2 \ = 1, and covariance func- 
tion p (n Hy) = E [ G ^ (x + y ) G f (x ) ] . Discrete spectral representations or truncated 
KL expansions (Sects. 3.6.4 and 3.6.5) can be used to construct the fields {G (,i, (x)} 
such that the covariance function of G (n \x) converges to that of G(x) as n -» 00 . 

Theorem 9.6 The parametric translation model cij\x) in (9.87) is a homogeneous 
field with marginal distribution F that becomes a version of aj(x) as n —> 00 .If 
G (n) (x) has continuous samples and the mapping G(x) 1 — > aj-(x) — h(G(x)) in 
(9.87) is continuous, then aj l) (x) has a.s. continuous samples. 
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Proof The finite dimensional distributions of dj.'\x) are P( f fj =l cij \xj) < Zi) = 
P( nf =1 G^fxj) < 0~ l o F(zi)), where <7 > 1 is an arbitrary integer and zt el. 
Since G (n fx) is weakly homogeneous and Gaussian, it is a homogeneous Gaussian 
field, and so is aj\x). 

By the normal comparison lemma ([35], Theorem 4.2.1) we have 

|B(nf =1 aj \xj) < Zi) - P( nf =1 a T (xi) < Zi) I 
= I P(nf =1 G (n) (x ( ) < ft) - p(n ? =1 Gfe) < ft) I 

< ^ ^ |p (,,) (-ri - Xj) - p(xi - *7)1(1 - oifj)~ l/2 exp ( 

1 <i<j<q ^ 

where oijj = maxdp^C*,- — Xj) |, | p (*; — * 7 )|) and ft = <P _1 o P(z,). The con- 
vergence of the covariance function of G Hx) to that of G(X) as n — > 00 implies 
P( n ? =1 dj\xi ) < z,) — ► P( n ? =1 aj{xi) < Zi). that is, the finite dimensional 
distributions of a { -p (x) converge to those of aj{x), so that a^\x) becomes a version 
of ar(x) as n — > 00 . If the samples G^(x, co) of G (n \x) are continuous functions 
for co e £2 \ E2 q. P{&o) = 0, then cij\x,co) = h{G (n) (x , co)) are continuous 
functions by the postulated continuity of the mapping in (9.87). ▲ 

Second step. We have seen that parametric models of the type in (9.84) can be 
constructed for the random fields a(x) and fix) in the definition of the stochastic 
boundary value problem defined by (9.53). Also, it is possible to construct a mea- 
surable mapping g : 1R'" — > P C R m relating Z to an m-dimensional random 
vector G = (Gi, . . . , G m ) defined on the same probability space (Q . .P, P) since 
Z = g(G) by (8.30). The coordinates of Z can be approximated by 

"PC 

Z* = « t (G)=:5> 1 nfe(G). k = l, ... ,m (9.88) 

i=0 

where {ak,i} are coefficients, { 1 //, ( G ) } denote polynomial chaoses (Sect. B. 6 . 2), and 
n pc is the largest degree of polynomial chaoses in the representation of Z&. This 
representation and (9.84) give 


(ft 2 + ft 2 )/2\ 

-rr^r j 


a(x) ~ a(x, Z) ~ I I &k,i a k( x ) 

i= 0 k= 1 
npc r m 

f(x) - /(*, Z) ~ II Uk.ifk(x) 


MG) 

1 /r;(G), X e D. (9.8 


Example 9.16 Let Z ~ G(0, 1) and G ~ N(0, 1) so that Z = g(G) = <P(G). 
Let Z ~ z (np<> = X/S a kHk(G), where aj and /ft are coefficients and Hermite 
polynomials (Sect. B. 6.1). Since /ft are polynomials and G takes values over the 
entire real line, the support of Z i,lpc} is also the real line, so that the distributions of 
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Z ( " pc) and Z differ significantly. The probability law of G and the mapping G (->• 
Z = 0(G) are inconsistent with the distribution Z ~ 17(0, 1) postulated for Z. 
Alternative polynomials and distributions need to be considered (Example 8.14). O 

Note that the parametric translation models (9.87) can also be used to construct 
polynomial chaos representations for the random functions in the definition of sto- 
chastic equations similar to those in (9.88), since they are nonlinear functions of 
x e D depending on a finite number of independent Gaussian variables. 


9.4.8.2 Physical Space Discretization 


The weak solution U{x) of (9.53) belongs to '#'( 7), £2), so that U and VI/ are in 
L 2 (D x £2). Since U (•, co) has an infinite number of degrees of freedom, we construct 
approximations for this function in a subspace of L 2 (D) spanned by a finite family 
of functions (GCO, ■ ■ • , (h,, (-*)), where : D — > R, j = 1 , ,n e , are square 
integrable in D. 

The numerical solution of (9.53) is sought in a finite dimensional subspace of 
W ( D , £2) spanned by {\fr,-(G)^(; c)}, ; = 0, 1, . . . , npc, j — 1, . . . , n e , so that the 
projection of U(x, co) on this subspace has the form 


"PC "e 

U(x, co) — zz Mi ,j'l'i(G(co))Sj(x) 

1=0 7=1 


n e 

z 


- «PC 

YuijMcm 


7=1 L i = 0 


£j( x ) 


(9.90) 


where (x, co) e D x Q and {«;,/} are coefficients that need to be determined and 
{£/ (x ) } can be interpolators corresponding to a finite element partition of D with n e 
nodes. The square brackets in this equation are truncated PC representations for the 
solutions at the nodes of the finite element partition of D. 

The weak form of (9.53) in (9.54) with U (x, co) in (9.90) and {^ip 7 } in place of 
IT becomes 


”e "PC 

zz Skipi) = if, Skfi), k= 1,..., lie, / = 0 , 1 , . . . , npc, 

7=1 1=0 

(9.91) 

where a in the expression of 7Z and/are replaced by their approximations in (9.89). 
The latter equalities constitute a system of equations for the unknown coefficients 
{uij } in the approximate representation of U(x, co). Once {«,- j } have been obtained, 
the expression of U(x, co) in (9.90) becomes known, so that its probabilistic charac- 
teristics can be estimated efficiently by Monte Carlo simulation. 


9.4.8.3 Solution Existence and Properties 

Let U(x,co) be the solution of the stochastic boundary value problem in (9.53) 
defined on the product space (D x £2, 7Z( D) x ,7% X x P), where D is a bounded 
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subset of with boundary 3 D (Sect. 9.4.3). It is assumed that (1) the random 
fields a(x) and fix) are measurable in both arguments, (2) the correlation functions 
of a(x ) and fix) are continuous in I) x D, (3) the random field a(x) is uniformly 
bounded P- a.s. in D, that is, P(a(x ) e [a, /J], x e D) = 1 for 0 < a < ft < oo, 
(4) the field a(x ) has uniformly bounded and continuous derivative, that is, P(a e 
C l (D), sup vg 2 ) |Va(x)| < c) = 1 for a constant c > 0, and (5) the source term/is 
square integrable P-a.s. in D, that is, E [J f) fix) 2 dx\ < oo [20], 

The first assumption allows the interchange of the order of integration in the physi- 
cal and probability spaces by Fubini’s theorem (Sect. 2.6). Assumption (2) guarantees 
the existence of KL expansions for the random fields a(x) and f(x) (Sect. 3.6.5). The 
bilinear form 3§{U , IT), V, W e #'(/). f2). in (9.56) is continuous and elliptic by 
assumption (3). Assumption (4) is needed to ensure the regularity of solution U(x). 
The linear functional (/, W) L 2 ^ Dx q } of W is bounded by assumption (5). Under 
these assumptions, the weak form of (9.53) in (9.54) admits a unique solution by 
the Lax-Milgram theorem (Sect. B.4.2). Moreover, bounds can be established on the 
discrepancy between solutions of distinct elliptic stochastic boundary value prob- 
lems satisfying the above assumptions (Theorem 9.4). Alternative bounds holding 
under various assumptions can be found in [28, 20]. 

We have seen that the discretization of the probability space involves the repre- 
sentation of the random fields a(x ) and fix) in the definition of (9.53) by parametric 
models depending on a random vector Z = {Z\, , Z m ), m < oo, and of Z by trun- 
cated polynomial chaos series. It is common to assume that the coordinates { Z* } of Z 
are independent random variables taking values in bounded intervals /"). = Z/RL?), 
so that the support of Z is the rectangle r = x ™ = 1 1\ i n M'". Accordingly, 
the stochastic variational problem in (9.54) can be viewed as a deterministic varia- 
tional problem 



(9.92) 


where p(z) denotes the density ofZ, the functions a,f and U are viewed as depending 
on x and z, and W P (D, P2) is the space in (9.55) restricted to R'" -valued random 
variables with range r that are mean square integrable with respect to p. 

It is assumed without loss of generality that 7* are unit intervals on the real line 
and that the law of Z coincides with the Lebesgue measure, that is, the coordinates 
{Zk} of Z are independent U(— 1/2, 1/2) variables. Let denote the subspace 
of [ 2 (I\) spanned by polynomials with degree at most and let £Z r = x ™ = , f J Z rk 
be a polynomial subspace of L 2 (U). Denote by npc the dimension of PZ r and let 
jj(m.npc) b e solution of (9.53) with a parametric model a(x, Z) in place of a(x) and 
Z approximated by a member of PZ r . Denote by U lm> (x) the solution of (9.53) with 
a( x, Z) and/(;c, Z) in place of a(x) and f(x). The function U < ' m ' nvc ' > (x) is U (m \x) for 
a(x, Z) and /(x, Z) in which Z is approximated by polynomial chaoses up to degree 
live - The rate of convergence to U ^ is given by the following statement 

([21], Theorem 4.9). 
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Theorem9.7 If a < a (x ) < p, () <a < ft < oo, P-a.s., P coincides with theLebesgue 
measure, r = [— 1/2, l/2] m , and the correlation function of a(x) is piecewise 
analytic, then 

\\U (m) - U (,M \\ H i (D)xL 2 (r) < exp ( - cic- 1/£/ (log(«p C )) 1/d ) (9.93) 

where c\, C 2 > 0 are constants such that npc < exp(ftm). 

The bound in (9.93) shows that, for a given parametric model a(x, Z) of a(x) 
depending on m random variables, the accuracy of U < " Lnpc> (x) increases with the 
dimension npc of the polynomial chaos representation and decreases with the dimen- 
sion d of the physical space. We conclude this section with a numerical example 
illustrating the implementation of the stochastic Galerkin method and its potential 
limitations. 

Example 9.17 Let U(x) be the solution of the stochastic partial differential equation 
(9.53) in Example 9.8 with a(x) in (9.62), (9.63), and (9.64). We apply the stochastic 
Galerkin method to characterize U(x). The implementation of this method requires 
to discretize the probability and physical spaces. The discretization of the probability 
space is commonly accomplished by replacing the random field a(x) in (9.53) with 
the linear parametric model a {n ' C 'Hx) in (9.66), that can be given in the form 

m 

fl ( "'°(x, w) = flo + ^a r (x)Z r (co), (9.94) 

r= 1 

where m — 2 n, { Z,- } and a r (x) correspond to the random variables and the deter- 
ministic functions in the expression of a (n ’G (x). For example, Z\ = Ci, Zi = 
Lb , a\ (x) = ft cos(v® ■ x), and a 2 (x) = ft a k sin(v® • x). The random variables 
{Ck, E> k } are independent {/(—-/ 3, \/3), so that they have mean 0 and variance 1, 
and ft = 0.3286 is a scale factor such that most samples of a^'^Hx) are positive 
(Example 9.8). The random variables { Z,- } in (9.94) are approximate by 

"PC 

Zr (of 2 k Y.ar.kHkiGrim)), (9.95) 

k= 1 

where npc Z 0 is an integer denoting the largest degree of polynomial chaoses 
considered for solution, { Hk (x)} denote Hermite polynomials (Sect. B. 6. 1), Ilk (x) = 
Hk(x)/Vk\ so that E[Hk(G r ) Hi(G r )] = Ski, G r ~ ^(0, 1), and the coefficients in 
the expression of Z r (co) are a r k = E[Z, Hk(Gk )]■ Note that a ( "’^(x) in (9.94) with 
Z r in (9.95) takes the form in (9.89). 

The discretization of the physical space has been discussed in Sect. 9.4. 8. 2. 
It represents the solution U (x , co) as a member of a finite dimensional space spanned 
by a family of specified functions (ft] (x), ... , ft„ e (x)) for almost all o> e Q . that is, 

n e 

U (x, co) — Uj(co)fti(x), x e D, co e £2 . 
i= 1 


(9.96) 
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The coefficients {t/, } are functions of the random vector Z = (Zj, . . . , Z m ), which 
is mapped by (9.95) into a Gaussian vector G = (G i, . . . , G m ) with independent 
MO, 1) coordinates. The random variables {{/,} are represented approximately by 
their projections on the subspace of W (D, Q) spanned by the polynomial chaoses 


]^[ H kr (G r ), k\ H b k m = 0, 1, . . . , npc, k\,...,k m > 0 1, (9.97) 

l r=l ' 

up to degree npc (Sect. B. 6. 2), that is, 

h PC m 

Ui(co) — Y u i,k\,-,k m n Hk r (G r (a>)), (9.98) 

k\-\ \-k m =0,ki,...,k m >0 r= 1 

where are real-valued coefficients. If these coefficients are known, prop- 

erties of U(x ) can be calculated simply from (9.96) and (9.98) by Monte Carlo 
simulation. 

We replace the solution of (9.53) with that of 


2 


z 


3 a(x) dV(x) 
dx/ 3 xi 


+ a(x)AV(x) — — 


I 3 a(x) 

I I 3xi 


(9.99) 


where V(x) and [/(x) are related by V (x ) = U(x) — xi / 1\ . This equation is preferred 
since it has homogeneous Dirichlet boundary conditions. The boundary conditions 
for (9.99) are V(0, xi) — 0 and V(l\, xi) = 0 forx 2 e (0, h) and 3V(xi, 0)/3.X2 = 
dV (xi, h)/dx 2 — 0 forxi e (0, h). The solution V (x, co) of (9.99) admits the same 
representation as U(x, co), that is, 

n e 

V(x,co) — Vi («)£< (x), x e D, co e Q and 

1=1 

72 pc 772 

Vi(w)~ Y v iM _ km Y[H kr (G r (co)), (9.100) 

k\-\ \~k m = Q,ki,...,k m >0 r= 1 


where km ! are real-valued coefficients that need to be determined. 

The weak form of (9.99) results by using the representations in (9.94), (9.95), 
and (9.100) for a (n -^{x, co) and V(x, co), multiplying the resulting form of (9.99) 

b y £/(*) n;=i H Sq {G q ) for j = 1, . . . , n e and si -( b s m = 0, 1, . . . , n Pc such 

that si, ... ,s m >0, and integrating over D x We obtain 


722 2 72 pc 

IZb;h J ZZZ fl ^* 


f 3 a r (x) 3£/(x) 
D 3xi 3 xi 


t)j(x) dx 


x E 


r= 1 1=1 k = 0 

772 772 

H k (Gr) nwn^(G,) 


P= 1 9=1 

+ ciq A£i(x)£j(x) dxE 


[n^( 6 »n 

L n=l <7=1 


77 s? (G 9 ) 
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m 2 n pc 


L 




a r (x)A^j(x)^j(x) dx 


r=l /=! k = 0 


m 


m 


X 


E H k (Gr) X[h^ g p)]\K^ 


h 



L 


I" da r (x ) 

D d*l 


Sj(x)dxE fl k (G r ) f| H Sq (G q ) , (9.101) 


m 


r = 1 k = 0 


q=\ 


where the symbol E means =o k t k >(r This equation written for j = 

1 , . . . ,n e and si + ■ • • + s m = 0, 1 , . . . , npc such that si, ... ,s m > 0 supplies a 
system of algebraic equations for The solution of this system of algebraic 

equation and (9.100) define an approximate representation for V(x, co) that can be 
used to calculate properties of V(x) and U(x). 

Numerical results have been obtained for npc = 2, n = 4, m = 8, and £ = 0.3286. 
The coordinates of the frequencies v®, k = 1, . . . , 4, in the representation of a > 
given by (9.94) are Vj = ±1.5 and v 2 = ±1.5, and are the volumes under 
the spectral density in Fig. 9.5 for {(vj, V 2 ) : vi > 0, V 2 > 0}, { (vj , V 2 ) : vi > 0, 
v 2 < 0} , { (vi , V 2 ) : vi < 0, V 2 > 0), and {(vi,V 2 ) : vi < 0, V 2 < 0). This 
representation is one of the cases considered in Example 9.8, and illustrated in Figs. 
9.8 and 9.9 (top right panel). 

We have used polynomial chaoses of low degree and just a few harmonics to 
minimize calculations. For our selection n pc; = 2 and n = 4, a system with 4095 
equations has been obtained and solved to find the coefficients v^ kl _ ^ km in the 
expansion of V(x, co). Larger values for npc and n may not be justified since the 
conditions that a ( "T) must satisfy yield solutions U(x) with very small variance. 

The left and right top panels in Fig. 9.29 show expectations of U(x) obtained 
by Monte Carlo simulation and the stochastic Galerkin method. The bottom left 
and right panels show standard deviations of U(x) by Monte Carlo and stochastic 
Galerkin. The Monte Carlo estimates of the mean and standard deviation of U(x ) are 
based on 800 samples. The mean and standard deviation of U(x) delivered by the 
stochastic Galerkin method are satisfactory. O 

9.4.9 Stochastic Collocation Method 

Consider the stochastic initial-boundary value problem in (9.33). Suppose the random 
fields in this equation are replaced by parametric models depending on a random 
vector Z defined on a probability space (±? , ■'¥ . P ). The representations a(x, Z) in 
(9.84) or cij\x) in (9.87) for the random conductivity a(x ) in (9.62) are examples of 
parametric models. The solution U(x, t, Z) of this version of (9.33) is a deterministic 
function of the spatial and temporal coordinates (x, t) and the random vector Z, so 
that U(x, t, Z) is a parametric model. 
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Fig. 9.29 Expectations of U(x) by Monte Carlo ( top left panel) and stochastic Galerkin (top right 
panel ) and standard deviations of U(x) by Monte Carlo ( bottom left panel) and stochastic Galerkin 
(bottom right panel) 


The construction of collocation solutions involves three steps. First, collocation 
points ; = 1, ..., n c , need to be selected in the image r = Z(£2) of Z. Second, 
n c deterministic analyses need to be performed to find the solutions u <l Hx, t) = 
U(x, t , z (,) ) of (9.33) with Z = ■ Finite element or any other method can be used to 
calculate these solutions. Third, approximations U (x, t, Z) need to be developed for 
U ( x , t, Z) from the deterministic solutions u^(x, t ), i = 1, . . . , n c . Interpolation 
polynomials are commonly used to construct U(x,t,Z). Probabilistic characteristic 
of U(x, f, Z) can be obtained efficiently by Monte Carlo simulation, quadratures, or 
other methods since the functional form of U (x, t, Z) is known. 

The approximate solution U(x,t,Z) can be viewed as a response surface for U(x, 
t, Z) developed over the range r — Z(Q) of Z. The performance of the collocation 
solution U(x,t,Z) depends on the accuracy of the finite element algorithm used 
to calculate {u (l Hx, t)\. the number and the location of collocation points, the type 
of interpolation polynomials, and the properties of U(x, t, Z), which are not known. 
The accuracy of the algorithm for calculating the deterministic solutions { u {,) (x, t)} 
is the only item on which we have full control. The number of collocation points 
is primarily dictated by computation time, that is, the time needed to compute a 
deterministic solution u (l \x,t). The location of collocation points is usually selected 
by algorithms that are unrelated to the distribution of Z. The dimension of Z must 
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be kept relatively small if full tensor grid is used to sample Z, as illustrated by the 
following example. 

Example 9.18 Let U(x) be the solution of d(a(x)dU(x)/dx)/dx = 0, x e 
{0,1), 0 < l < oo, with boundary conditions U{ 0) = 0 and U(l) = 1, where 
a{x) is a strictly positive and bounded random field with continuous samples. We 
have 


K{u)du/ J K(v)dv, xeD = (0,l), (9.102) 

with the notation K(x) = 1 /aix). Suppose K(x) is the Beta translation random field 

K(x) = a + {fl-a)F-^ pq) ocp(G{x)), x e D = (0, /), (9.103) 

where F^ tX a(p,q) denotes the distribution of a standard Beta variable with shape 
parameters (p,q), 0 < a < ft < oo are constants, and Gix) is a homogeneous Gaussian 
field with mean 0, variance 1, and one-sided spectral density g(v), v > 0. 

The Monte Carlo solution for this problem requires to generate samples of G(x), 
calculate corresponding samples of K(x ) and U(x) from (9.103) and (9.104), and 
estimate properties of U(x) from its samples. Samples of G(x ) can be generated 
from, for example, the discrete spectral representation (Sect. 3. 8. 1.2) 

n 

G in \x) = X^iAkCosiv^x) + Bk sin(v^x)), x e (0, /), (9.104) 

k= t 



(yi\ 

where n > 1 is an integer, 0 < v < oo denotes a cutoff frequency, v. = 
(. k — 1/2) v/n, (o^ H) ) 2 is the area under ag(v) in the interval (v^ n) — v/(2 n), + 

v/(2 n)), k = 1 and a = 1/Xi=i ( cr /" ) ) _ a scaling factor such that 
G <n> (x) has unit variance. 

The stochastic collocation method can use a parametric model for K{x) given by 
(9.103) with G{x) replaced by 

n 

G (n \x) = cosiv^x) + 0-\v k )sm{v^x)), x e (0,1), 

k= 1 

(9.105) 

with the notations in (9.104), where [Uk, V/tl are independent U(0, 1) variables. The 
resulting representation K^(x) for K(x) is a nonlinear parametric model depending 
on the m-dimensional vector Z = (U\, Vi, . . . , U n , V n ), m = 2 n, so that the col- 
location solution is a function U(x, Z) of x e D and Z. The implementation of the 
collocation method requires to select collocation points Zx K i — 1, . . . , n c , in the 
image r — Z(Q) = [0, l] m of Z, calculate deterministic solutions u (j) (x) corre- 
sponding to Z = z (,) . construct approximations U(x, Z) for U(x) by interpolating 
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over r between \u {l Ux)}. There is no general methodology for selecting collocation 
points in an optimal manner. If factorial design with two levels for each coordinate 
of Z is used, there will be n c = 2 m collocation points, for example, 16, 1024, and 
1048576 points for m = 4, 10, and 20, respectively. This suggests that collocation 
solutions based on a full tensor grid are not feasible in applications since Z is likely 
to have a relatively large dimension. O 

An additional difficulty relates to the accuracy of polynomial interpolations that 
deteriorates with the dimension of Z. We summarize essential properties of polyno- 
mial approximations for functions defined on intervals of the real line and extend 
these results to functions defined on bounded subsets of K" , d > 1 . 

Interpolation polynomials commonly used to construct collocation solutions are 
defined in Sect. B. 6 within the framework of the Sturm-Liouville differential equa- 
tions. Following facts on the accuracy of polynomials approximations for real- 
valued functions are summarized for convenience. Let I be a bounded interval of 
the real line, w : / -> la positive weighting function, and L 2 .(l) = {/:/—»■ 
R : h f(x) 2 w(x)dx < oo}. Note that (/, g) L 2 (/) = J, f(x)g(x)w(x)dx , f,ge 
L 2 ,(I ), is an inner product on L 2 ,(I) inducing the norm \\f\\i 2 (i )2 = fi f(x) 2 w(x)dx 
on this space. Let \<fik (x), k — 0, 1, . . . , n} be orthonormal polynomials & n ( I ) with 
degree at most n, so that (<pk, <Pi)l 2 (I) — &kl and the projection of / e L 2 ,{I) on 
the space spanned by these polynomials is 7T„( f) = V? n ( f, Wk) ji m<Pk(x). Then 
11/ - iTn(f )\\,7 { i ) < inf 11/ - p\\lI(I) (t 36 / Theorem 3.3,”and Sect.B.4.1 
in this book). It can also be shown that \\f — 7T„ (/) || ^2 , — > 0 as n — ► 00 ([36], 
Theorem 3.5). The rate of convergence of 7r„ (/) to/depends on the smoothness of 
/. For example, if/ and its first p > 0 derivatives are in L 2 (l ) and w(x) — 1, that 
is, /is a member of the Sobolev space H P (I) defined by (9.36), I = [— 1, 1], and 
(fik (x) are Legendre polynomials, then \\ f — 76,(/)|| i 2 l(/) — cn ~ p \\f\\HP(i), where 
c > 0 is a constant and || ■ || hp(I) denotes the norm in (9.37) ([36], Theorem 3.5). 
Hence, the error of a polynomial approximation tt,, (/) is of order 0{n~ p ), so that 
it relates directly to the degree of smoothness of/. 

Consider now the solution U(x, t, Z) of (9.33) and suppose that n c \ collocation 
points are used along each coordinate of Z, so that the full tensor grid has n c = n"' { 
collocation points, where m = 2 n denotes the dimension of Z. The error of a poly- 
nomial approximation tt„ c , (Z,- j for U(x, t, Z) viewed as a function of Z,- and all 
other arguments arbitrary but fixed is of order 0(n c p ), where p denotes the order 
of differentiation of U(x, t, Z) with respect to Z,- . Hence, the error of the polynomial 
approximation p(Z) = n"Li 71 n, 1 (Z r ) for U(x, t, Z) constructed on the full tensor 
grid is of order 0(n c p ^' n ), so that it increases rapidly with the dimension of Z. The 
accuracy of collocation solutions depends also on the location of collocation points, 
as illustrated by the following example. 

Example 9.19 Let U be the solution of the stochastic algebraic equation Z U = 1, 
where Z is a Beta random variable with range (a, fi), 0 < a < ft < 00 , and shape 
parameters (p,q ). The collocation solution has the expression 
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1 

ft 


ft) 



z 



Fig. 9.30 Lagrange polynomials it (Z) ( left panel ) and exact and approximate mappings ( right 
panel ) 


U(Z) = £ 

i '=0 


1 

a + i(/3 — a)/n 




(9.106) 


where £,(Z) = ]~[”_q j±i {Z — z^)/{z ( ' > ' — z are Lagrange interpolation poly- 
nomials (Example 8.16). The left panel in Fig. 9.30 shows Lagrange interpolation 
polynomials £, (Z) constructed on n c = 5 equally spaced collocation points, two of 
which are a and fi. The solid and the dotted lines in the right panel of the figure 
are the exact mapping Z i->- U(Z) = 1/Z and the mapping Z i-> U(Z) in (9. 106). 
The approximate mapping provides a reasonable overall representation for U(Z). The 
table, 


07 = 1/2,^ = 3) 30.96; 55.37; 69.12; 77.07; 82.06; 85.45 

(p =1, q = 1) 25.91; 100.12; 161.37; 200.43; 226.68; 245.72 

(p = 2, 4 = 5) 35.18; 135.23; 287.36; 452.44; 605.25; 738.50, 

gives errors 100(F[U ( Z) r ] — E[U ( Z) r ])/E[U (Z) r ] in percentages for the first six 
moments of U (Z) relative to corresponding exact moments for [a, fi] = [0.1, 3] 
and several values of the shape parameters (p, q). The errors depend strongly on 
both properties of Z and the quality of the approximate mapping Z U (Z) in the 
intervals of likely values of Z. For example, errors are significant for (p = 2, q = 5) 
since the density of Z is skewed to the left for these parameters and the mapping Z i->- 
U (Z) is the least accurate in the interval (0.2, 0.7). The errors for (p = 2, q = 5) are 
reduced to —7.74, 23.01, 70.88, 129.00, 184.12, and 23 3. 12 if the collocation points 
(0.1, 0.5, 1, 2, 3) are used, suggesting that the collocation grid needs to be refined 
in the subsets of r = Z(Q) in which Z resides with relatively high probabilities 
and U (Z) is less accurate. Unfortunately, this scheme cannot be implemented fully 
since it requires information on the behavior of U (Z) as a function of Z, and this 
information is not available. O 
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We have seen that the accuracy of collocation solutions depends on the quality 
of the parametric models used to represent the random fields in the definition of 
stochastic equations, for example, the parametric model a <n 1 (x ) of a(x) in (9.65), 
the distribution postulated for the //(-dimensional random vector Z in the definition 
of parametric models, the accuracy of deterministic solvers, and the performance of 
polynomial interpolation. Let U in Hx) be a collocation solution corresponding to a 
parametric model a (n \x) of a{x), projections of U <n Hx) on, for example, 

the finite element space, and jthU polynomials interpolations constructed on 
the finite element solutions tt/,!/*" 1 . We have 


II U - &(n h U (n) )\\ < || U - U (n) || + \\U (n) -7t h U (n) \\ + || 7t h U (n) - &(jr h U (n) )\\ 


(9.107) 


in some norm. Bounds on the first two norms result from Theorem 9.4 and (9.59). 
A relatively simple bound on the third norm can be obtained under the assumptions 
that (1) the solution jtj, U (n \x, y) viewed as a function of (x, y) e D x T, T = 
Z(£2) admits an analytic extension in Ej = {z e C : dist (z,Ej) < aj}, j = 
1 , ,m, a j > 0, for each coordinate of y = (y\, ... ,yj, ... , y m ) , (2) the density 
of Zis uniform, and (3) the range r of Zis partitioned in finite elements with mesh size 
h. Under these assumptions, the discrepancy between tt/, U <n> and its approximation 
&(iTiiU {n) ) given by Lagrange polynomials satisfies the inequality 


d 



(9.108) 


where rj(h ) = log ((a ; /(2/t))[l + 1 + /7 2 /a^]) and c > 0 denotes a constant 

[29, 37]. The accuracy of polynomial approximation increases as the mesh is refined, 
that is, the mesh size h is reduced. Note that finer meshes are needed for larger 
vectors Z. Alternative bounds on the accuracy of polynomial interpolation in the 
probability space can be found in [38] and [36] (Chap. 7) for both full tensor and 
sparse collocation grids. 

It turns out that the full tensor grid can be reduced significantly while preserving 
the accuracy of resulting interpolation formulas by using sparse grid collocation 
techniques based on the Smolyak algorithm. For the construction, properties, and 
implementation of the Smolyak algorithm the reader is directed to [29, 37, 39, 38]. 
Additional information on sparse grid collocation can be found in [36] (Sect. 7.2.2). 
In contrast to collocation solutions based on full tensor grids that are feasible for 
problems depending on a small number of random parameters [31], sparse grid 
collocation solutions can be used to solve relatively large stochastic problems [39]. 

We conclude this section with the solution of the stochastic partial differential 
equation in Example 9.17 by the stochastic collocation method. The analysis uses 
the same linear parametric model for a(x ) as in this example. 

Example 9.20 Let Uix) be the solution of the stochastic partial differential equation 
(9.53) in Example 9.8, where a(x) is approximated by the linear parametric model 
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(x) in (9.66). Let Z = (C i, D\, . . . , C„, Z)„) be the K'" -valued random variable 
collecting the random coefficients in the expression of a^ n, ^(x), where m = 2 n. 
Denote by r r — Z, (Q), r — 1 , ,m, the images of the coordinates of Z, so that 

r = Z(Q ) = x ™ = | /; . Let (r] r \ , . . . , )} r n<: r ) be collocation points along coordinate 
r of Z, where n c r > 1 is an integer. The collocation solution has the form 

n c 

U(x) = ^h (0 (x)£ (!) (Z), (9.109) 

/=i 

where n c = Y[7=i n < - r denotes the number of collocation points, {u^(x)} are solu- 
tions of (9.53) for Z set equal to the selected collocation points i] (,) e F, i = 
1 


2 n 

^°(Z) = n^(Z r ), and 

r=l 


,(z,.)= n 

j=h.i?r 


Z-i T] r j 
Vr,r - r) r ,j 


(9.110) 


are Lagrange interpolation polynomials along the coordinates of Z. 

Numerical results are for the same models and parameters as in Example 9.17 
and collocation points r] c r — {— 3/V5, 0, 3/V5} along each coordinate of Z. Since 
Z has m = 2n = 8 coordinates, the approximation of solution f/(x) is based on 
3 m = 6561 collocation points, so that it involves 6561 deterministic analyses. The left 
and right top panels in Fig. 9.31 show expectations of U(x ) obtained by Monte Carlo 
simulation and the stochastic collocation method. The bottom left and right panels 
show standard deviations of U(x) by the Monte Carlo and stochastic collocation 
methods. The Monte Carlo estimates of the mean and standard deviation of U(x) are 
based on 800 samples. The mean and standard deviation of U(x) delivered by the 
stochastic collocation method are satisfactory. O 

The stochastic Galerkin and collocation methods provide accurate approximations 
for the mean and standard deviation of the solution U(x) in Example 9.17. At a first 
glance, the remarkable performance of these methods may be surprising since both 
methods have used coarse representations of uncertainty. The low uncertainty in U(x) 
is the likely reason for the satisfactory performance of these methods. 


9.5 Applied SPDEs: Small Uncertainty 

Consider the stochastic elliptic boundary value problem in (9.53) with weak form 
given by (9.54). For simplicity, suppose the source term fix) is deterministic and 
the coefficient o(x) can be approximated by the linear parametric model a (x . Z) = 
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Fig. 9.31 Expectations of U(x) by Monte Carlo (top left panel) and stochastic collocation ( top right 
panel) and standard deviations of U(x) by Monte Carlo ( bottom left panel) and stochastic collocation 
(bottom right panel) 




Xl-Li ak(x)Zk in (9.84). Note that the solution U(x, Z) of (9.53) is also a parametric 
model. We assume that the stochastic boundary value problems under consideration 
admit unique solutions P-a.s. 


9.5.1 Taylor Series 

The first order Taylor approximation of U(x, Z) has the form 

m 

U (x, Z) ~ U{x, fi) + y" 1 Vk(x, n)(Zk - /ik) = U (x, /x) + VU(x, ft) ■ (Z - pt), 
k= 1 

(9.111) 

where fik = E[Zk\, fi = E[Z ], Vk(x, Z) = dU(x, Z)/dZk, and m is the dimen- 
sion of Z. To find properties of the above approximation of U(x, Z), we need 
to calculate U(x, pt) and j \4 (x , /i ) ) . The solution of the deterministic version of 
(9.53) with Z = /x gives U (x , /x). The gradient of U(x, Z) at Z = /x can be 
obtained by differentiating (9.53) with respect to Zj, j = 1 , ... ,m, which gives 
— V ■ (cij(x)'VU(x, Z)) — Xc=i -Zyt V • (ak(x)Wj(x, Z)) = 0, and solving for 


448 


9 Stochastic Partial Differential Equations 


SJU(x, Z) at Z = p. These operations give 

m 

- y Hk V • ( cik(x)Wj{x , p)) = V • (fl / (x)VU(x, p)), j = 
k= l 

Since t/(x, Z) = 0 on 3/) P-a.s., we can solve for V/(x, /r). 

The representation in (9.1 11) can be used to find properties of f/(x, Z) approxi- 
mately. For example, its mean and the correlation functions are 

E[U(x, Z)] ~ C/(jc, p) 

m 

E[U(x, Z)U(y, Z)] - C/(x, /4)t/(y, m) + X ^ p)y U (9.112) 

*,/=i 

where {yt/ = £[(Z^ — pk)(Z/ — /x/)] } is the covariance matrix of Z and x, y e D. 

Example 9.21 The first order Taylor approximation of U(x, Z) defined by the stochas- 
tic boundary value problem — ZU'\x. Z) = 1, x e (0,1) with the homogeneous 
boundary conditions U (0, Z) = U (1, Z) = 0 is 

U(x,Z)~ (x-x 2 )/(2p) + (-x+x 2 )(Z- p)/(2p 2 ), (9.113) 

since U (x, Z) = (x — x 2 )/(2Z) and V (x, Z) = 3 U (x, Z)/3Z = (— x+x 2 )/(2Z 2 ). 
The mean and covariance functions for this approximation of U(x,Z) are 

E[U(x, Z)] - 

2p 

Var[Z] , , 

Cov[£/(x, Z), £/(y, Z)] ~ -^-^(-x + x 2 )(-y + y 2 ), (9.114) 

and depend on only the hrst two moments of Z. O 


9.5.2 Perturbation Series 

Suppose the random variables in the parametric model a(.r, Z) = X ™- 1 a k (x ) Z^ 
of a(x) admit the representation Z/. = pk + eZk, where e is a small parameter 
with respect to pk, E[Zk] — 0, and Var [Z^] ~ (9(1), A: = 1, . . . , m. Note that the 
variance of Z& is e 2 Var[Zt]. 

Consider the power series expansion 

U(x, Z) = U 0 (x) + eUi(x) + e 2 U 2 (x) + ■■■ (9.115) 

for the solution of (9.53). The functions Uo(x), U\ (x), . . . satisfy the differential 
equations 
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(e°) :»v • ( a k (x)VU 0 (x )) = -/(x) 

fc=t 

m m 

(s 1 ) • (ak(x)VUdx)) = - £ Z k (a k (x)VU 0 (x)) (9.116) 

>t=i k= l 

with homogeneous boundary conditions. These equations result from (9.53) with 
U(x, Z) in (9. 1 15) by setting zero the coefficients of the powers of s. Truncated power 
series of U(x, Z) can be used to find approximately moments and other properties of 
this random field. For example, the mean and correlation functions of U(x, Z) are 

E[U(x,Z)] ~ U 0 (x)+ 0(s 2 ) 

E[U (x, Z)U(y, Z)] - U 0 (x)U 0 (y) + e(U Q (x)Ui(y) + Ui(x)U 0 (y)) 

+ s 2 (Uo(x)U 2 (y) + Ui(x)Ui(y) + U 2 (x)U 0 (y)) + 0(e 3 ). 

(9.117) 

Example 9.22 For the stochastic equation in Example 9.21, we have 
( e °) : p,Uq(x) = — 1 , 

(e 1 ) : nU"(x) = -ZUq(x), (9.118) 


with homogeneous boundary conditions, so that 

x — x 2 Z(—x + x 2 ) 9 

U(x, Z) ~ — + £ —= + 0{e 2 ). 

All 2/1 z 

The corresponding mean and covariance functions, 


E[U(x,Z)]~?-?- + 0(e) 

2/1 


Co \[U{x, Z), U(y, Z)] ~ e 2 ^^(-.r + x 2 )(-y + y 2 ) + 0(e 3 ), (9.119) 

4/x 4 

have the same expressions as those in (9.1 14) at the indicated accuracy. O 


9.5.3 Neumann Series 

Consider the differential equation ,Z[U (x)] = V(x) in (9.40) with solution 
U(x) = JZ- l [V(x)]. If 1 exists, it can be represented by a Neumann series 
under some conditions. Truncated Neumann series can be used to calculateproperties 
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Fig. 9.32 Approximate 
mean of U{x) by Neumann 
series ( solid line) and Taylor 
series ( dotted line) for 
a = 1.2, f) = 3.0, n= 20, and 
hns = 20 



x 


of U(x). Neumann series solving Fredholm integral equation are discussed in [2] 
(Sect. 8.4. 1.4). We only illustrate the construction of Neumann series for stochastic 
algebraic equations derived from J?[U (x)] = V (x) by spatial discretization. 

Example 9.23 The finite difference approximation for the stochastic differential 
equation in Example 9.21 has the form — Z(f/;+i — 2 £/,• + C/,_i) = lr for i = 
2, . . . , n — 1, — Z{U 2 — 2U\) — h 2 for i — 1, and —Z(—2U n + £/„_ i) = h 2 for 
i = n, where h = 1 /(n + 1), {/, = U ( ih ), i = 1, . . . , n, and nodes 0 and 77+1 
correspond to x — 0 and x = 1 . The matrix form of the finite difference formulation 
is ZAU — B. where A is an (n,n)-matrix with non-zero entries A,- ,- = 2 and A, , + i = 
Ai+ij = —1,5 denotes the unit vector scaled by h 2 , and U = (Ui , . . . , U„)' is 
the solution vector. An alternative form of the stochastic algebraic equation ZAU = 
B is (1 + Z/p)U = (Ap)~ l B, where p = E[Z] and Z = Z — p. The Neumann 
series for U is convergent if there exists 0 < y < 1 such that \\(Z/ p)x\\ < y ||x|| for 
any x e R" (Theorem 8.19). For Z ~ U (a, /3), the Neumann series is convergent 
if f3 — a < 2/i. The Neumann series representation for U is U = U (r \ 

where U ^ = (-Z/p)U ( - r ~ l \ r = 1,2,..., and U (0 ^ = (Ap)~ l B. The mean and 
correlation matrices of U based on the first ii\\s terms of the series representation of 
U are E[U(Z)] ~ ^"=i E[U (r) ] and E[U(Z)U(Z)'] ~ Z",p S =i E[U (r) U^]. 

The plots in Figs. 9.32 and 9.33 are for Z ~ U(a, /3), so that E[(Z — p) q ] — [(yS — 
p) q+l - (a - p) q+1 ]/(P - a)/(q + 1), a = 1.2, 0 = 3.0, n = 20andn NS = 20. 
Figure 9.32 shows with solid and dotted lines approximations of E[U (Z)] obtained 
by the Neumann and Taylor series methods. The left and right panels in Fig. 9.33 
show the covariances of U(Z) by the Neumann and Taylor series methods. The first 
two moments of UiZ) by the Neumann series match those obtained by Monte Carlo 
simulation. The means of U(Z) by the Neumann and Taylor series methods are similar, 
but the covariances of UiZ) differ significantly. The unsatisfactory performance of 
the Taylor series method is caused by the approximate representation of the nonlinear 
mapping Z !-+► U (Z) by the linear function of Z in (9.1 1 1). O 
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Fig. 9.33 Approximate covariances of U(x,Z) by Neumann series ( left panel) and Taylor series 
( right panel) for a = 1.2, ft = 3.0, n = 20, and «ns = 20 


9.6 Exercises 

Exercise 9.1 Show that C/ (n) in Example 9.1 is mean square continuous. 

Exercise 9.2 Complete the details of the proof in Example 9.1 showing that the 
weak solution of (9.5) is unique. 

Hint Use Gronwall’s inequality stating that, if a > 0 is a constant, (Ml) > 0 and 
6(t) are real-valued functions defined on a bounded interval [a,b] of the real line, 
and 0(f) < a + J'_ t /3(s)d(s) els, a < t < b, then 0(f) < a exp [ f* fi(s) t/s] for 
t e [a, b ]. 

Exercise 9.3 Generalize considerations in Example 9. 1 by assuming that the corre- 
lation of the noise process is E[W(x, t) W(y, s)] = r(x, y) exp(— p|s — f|), p > 0, 
that is, W (x, t ) is a colored noise in time. 

Exercise 9.4 Consider the SPDE Jf[U(x)] = W (x), x e where ££ = (A — 
a 2 ) p , a > 0 is a constant, p > 1 denotes an integer, and W(x) is a Brownian sheet, 
that is, a Gaussian field with mean 0 and covariance function E[W (x)W (y)] = 
[| J=I (xj A yi), where x = (x\, , Xd ) and y = (yi, yd)- Find the spectral 
density of U(x). 

Exercise 9.5 Develop a Monte Carlo algorithm for solving the stochastic partial 
differential equation in Example 9.3. 

Exercise 9.6 Check whether the bilinear form 3§(JJ, W) in (9.46) defines an inner 
product on the set of functions W (D) defined by (9.45). 

Exercise 9.7 Let / : [a, b] -* R be a real-valued continuous function and [a, b] 
a bounded interval of the real line, so that / e C[a, b]. Is j'’ \ f(x)\dx a norm 
on C[a, b]l 

Exercise 9.8 Let X and Y be real- valued random variables defined on a probability 
space (£2 , & , P). Show that E[X Y] defines an inner product on L-(Q , - < F , P). 
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Exercise 9.9 Find a constant c > 0 satisfying the inequality in (9.61) for several 
distributions F of your choice. Does such a constant exist for any distribution FI 

Exercise 9.10 Let aj{x), x e E 2 , be a translation random field defined by (9.85) 
with G(x) = Xit=i (-A* cos(v£ • x) + Bk sin(v£ • x)) /V6, where n = 6, (A^, B *) are 
independent N(0, 1) variables, v (1) = (1,2), v (2) = (2, 1), v (3) = (2,2), v (4) = 
_ v (t), v (5) _ _ v ( 2 ) i and v® = — v (3) . Construct a polynomial chaos representation 
for gt(x) of the type in (9.89). 

Exercise 9.11 Find the effective conductivity of an one-dimensional specimen with 
properties as in Example 9.12 by using SROMs. 

Exercise 9.12 Construct estimators for the mean and correlation function of £/(x, t) 
in (9.60). 

Exercise 9.13 Calculate the second moment properties of Kj (x) in Example 9.15 
and show that they coincide with those of K\ (x). 

Exercise 9.14 Find the second moment properties of the translation model K(x) in 
(9.103) and compare the covariance function of this field scaled by its variance with 
that of G(x). 

Exercise 9.15 Write explicitly the system of algebraic equation given by (9.101) 
for n pc = 2 and n = 3. Solve Example 9.17 for these parameters. 

Exercise 9.16 Solve the one-dimensional version of the problem in Example 9.17 
by the stochastic Galerkin method. 

Exercise 9.17 Solve the one-dimensional SPDE in Example 9.18 by stochastic 
collocation using equally spaced collocation points in (0, 1) for K(x) given by (9. 103) 
with a = 0.01, ft = 3, shape parameters (p, q ) = (1, 3), (1, 1), and (3, 1), and 
a Gaussian field G(x) with mean 0 and covariance function E[G(x')G(x")] = 
exp (— p\x’ — x"\) for p = 0, 1, and 100. Assess the accuracy of the collocation 
solutions by Monte Carlo simulation. 
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Appendix A 

Parametric Models, Quantizers, and Stochastic 
Reduced Order Models 


A.l Parametric Models 

Let X(t), tel, be a real/complex- valued random function with mean 0, finite vari- 
ance, and correlation function r(s, t) — E[ X (s)X (/)*]. Our objective is to construct 
parametric models for Xit). Arguments similar to those presented in this section can 
be used to construct parametric models for vector- valued random functions ([1], 
Chap. 6). 


A l l Karhunen-Loeve Expansion 

The Karhunen-Loeve expansion has been introduced in Sect. 3.6.5. It was shown that, 
if the correlation function r(s, t) = E[X (v) A (f)*] is square integrable and contin- 
uous in 7 x 7, we have 


X(t) = l.i.m. n _» 00 y x\ ,2 X k (j> k {t), tel , where 
k= 1 

X- k = X k V2 Jxmk(t)*dt, E[X k ] = 0, and E[X k Xf] = S k i (A.l) 

and r(f, s) = i k k4 > kU)4 , k( s )* (Theorem 3.22). 

Parametric models can be obtained for random functions admitting Karhunen- 
Loeve expansions by truncation. For example, the processes X n {t)= Xt=t 
Xk4>k{t), n — 1,2,.. ., are parametric models for X(t) obtained by truncating the 
expansion of X(t) in (A.l). The sequence of processes { X n (t)\ converges in m.s. to 
X(t) as n — > oo by (A.l). 

Theorem A.l The mean square error of a parametric model X n (t) for X(t) is 
E[\X(t ) - A„(f)| 2 ] = zr=„+i A*I&(0I 2 . 


M. Grigoriu, Stochastic Systems, Springer Series in Reliability Engineering, 
DOI: 10.1007/978-1-4471-2327-9, © Springer- Verlag London Limited 2012 
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Proof The equality E[\X(t) - X n (t)\ 2 ] = Z“/ = „+i ^ k /2 ^j /2 E[X k Xf]^ k (t)M)* 
and the property E[X/ ( Xf] = yield the stated result. The error E\\ X (t) — 
X n (t) | 2 ] approaches 0 as n — > oo by the convergence X n (t) X(t). Note 

also that the difference between the variances of X(t) and X n (t) is Var[X(r)] — 
Var[Z n (t)] = Xit'Ln+i ^k\<Pk(x)\ 2 > 0 so that the variance of X n (t ) is smaller than 
that of X(t). A 

Theorem A.2 If X(t) is a Gaussian function, so are its parametric models X n (t ). If 
X(t) is weakly stationary, X n (t) may not have this property. 

Proof The random functions X n (t), n > 1, are Gaussian as linear forms of the 
Gaussian random variables { Xk } ■ That Karhunen-Loeve expansions X n ( t ) of weakly 
stationary processes X(t) can be nonstationary follows from examples showing that 
the variance of X n ( t ) is time dependent in contrast to that of X(t) [2], A 

Karhunen-Loeve expansions provide an alternative manner of specifying the 
probability law for Gaussian functions. However, they give no information beyond 
the second moment properties when dealing with non-Gaussian functions. 

Example A. 1 Let X(t) and Y(t) be diffusion processes defined by the stochastic differ- 
ential equations 

dX{t)= — pX(t) dt + ij2p dB(t), t> 0, and 

d¥(t) = - pY(t) dt + 7p( 3 - Y(t) 2 ) dB(t), t> 0. (A.2) 

The stationary solutions of these equations have the same second moment properties, 
that is, E[X(t)] = E[Y (t)] = 0 and £[Z(5)A (f)] = E[Y (j)T(f)] = exp(— p\s — t\), 
but different marginal distributions, Gaussian for X(t) and uniformly distributed in 
[— V3, yf%\ for Y(t). Yet, they share the same Karhunen-Loeve expansion. O 

Proof That E[X (f)] = E[Y (f)] = 0 holds is left as an exercise. For t > s we have 
Y(t) = Y(s)e- p(r ~ s) + f' e- p(t ~ u) J p(3 - Y(u) 2 )dB(u ) so that E[Y(t)Y(s)] = 
£’[T(i) 2 ]e _p ^ _ ' s - ) since E\Y ( 5 ) f ' e~ p( - , ~ u ^y/p{3 — Y (m) 2 ) c®(m)] = 0. The mar- 
ginal distribution f y of Yit) satisfies the Fokker-Planck equation (Theorem 7.4) 

dfy(y, t ) _ d(pyfy(y, t)) 1 9 2 (p( 3 - y 2 )f y (y, Q) 

dt dy + 2 dy 2 

whose stationary solution f st y (y)= lim^oo f y (y,t) is uniform in V3, 
Similar considerations show that the stationary density of X(t) is / st , t (x) = 
zyep(—x 2 /2)/yf2jx. A 

Definition A.l A real-valued, weakly stationary stochastic process X(t) is said to be 
m.s. periodic with period T > 0 if its correlation function r{ r) = E[X(t)X(t + r)] 
is periodic with period T. 

The correlation function of a m.s. periodic process X(t) with period T 
(Definition A.l) admits the representation r(r)= c* exp(z'kvor), where 
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c k > 0 and vq = 2tt/T. The spectral distribution S(v) of X(t ) is an increasing 
piecewise constant function. It is common in applications to interpret s(v) = 
-oo c*5(v ~ k v o) as the spectral density of X(t). 

Example A.2 Let X(t) be a real-valued, m.s. periodic process with period T > 0 
and mean 0, so that the covariance function and spectral density of X(t) are 
c(t)= XitL-oo c k exp(ikvor) ands(v) = XtL-oo c k 8(v-kvo), where c k > Oand 
vo = 27 r/T. The Karhunen-Loeve expansion of X(t) in the time interval / = [0. 7’] is 
given by (A.l) with X k — Tc k and <p k (t) = exp(i kvot) / VT , k = ± 1, ±2, . . and 
coincides with its spectral representation. O 

Example A.3 Let X(t), t e R, be a weakly stationary, real- valued stochastic process 
with mean 0, correlation function r (r) = E [X (t)X(t + r)], and spectral density s(v), 
veK. The linear operator 2z f [</>(/)] = r(t — s)(p(s) ds, tel, has a continuous 
spectrum with eigenvalues X = 2ns(y) and eigenfunctions exp(ivt), v e M, so that its 
Karhunen-Loeve expansion is X (t) = J_ e lvt dZ(v), where Z(v) has orthogonal 
increments with E[dZ(v)] = 0 and E[\dZ(v)\ 2 ] =s(v)dv [3], <> 

This example shows that Karhunen-Loeve expansions developed on unbounded 
intervals, the entire real line in the previous example, depend on an uncountable 
number of random variables, so that they cannot be used to construct parametric 
models. 

Example A.4 Let X( t), t e R, be a weakly stationary process with mean 0, correlation 
function r( r) = E[X(t)X(t + r)], and spectral density s(v). Let X(t) be a periodic 
process with period T > 0 such that X(t, a > ) = X(t, to), t e (— T /2, T /2). Consider 
the sequence of processes X n (t), n = 1,2..., representing periodic extensions of 
truncated Karhunen-Loeve expansions of X(t) in the time interval (— T /2, T /2). 
The processes X n (t) have Karhunen-Loeve representations similar to those of m.s. 
periodic processes and converge in m.s. to X(t) in (-T /2, T /2) as n -» oo. O 

Proof Note that X{t) is not weakly stationary, its samples have jumps at 
t = T/2 + rT,re Z, and ?(s, t) = coincides with £[Z(j)A(t)] in 

(— T /2, T /2) x {—T /2, T /2) and provides a periodic extension of TstXlTjATO] to 
R 2 . Let X„(t) = Yl=- n W k e ikvot , where W k = (1/7’) f^ 2 /2 X(t)e~ ikvo ' dt. Since 
E[W k \ = 0 and 

rT/2 rT/2 

? kl = E[W k Wf] = (l/T 2 ) / / r(t - s)e- l(kvo '- lvos) dtds, 

J-T/2 J-T/2 

the correlation function r n {t,s)— ~^ k r kie dkvot-lvos) x n (t) represents a 
partial sum of the Fourier series r(t,s)— '^j°i-- oc r k ie l( ' kvo, ~ lvos ' > of the corre- 
lation function of X(t). If E\ X (s)X (t)] is continuous and has continuous partial 
derivatives in (-T /2, T /2) x (-T /2, T /2), the Fourier series of r{t, s) converges 
to r(t, s ) in this rectangle ([4], Sect. 7.3), so that the sequence of processes { A",, (/)} 
converges in m.s. to X(t) as n — > oo in (-T / 2, T /2). 
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The processes X„ (?) can be given in the form 


Xn(t)= X X Pik e ilv ot 


(A3) 


k = —n N l = —n 


where W = ( W- n , . . . ,W n ), W = /3U , U = ([/_„, ...,(/„) is a random vector with 
uncorrelated coordinates, and is a deterministic matrix that can be obtained, 
for example, by Cholesky’s decomposition. The expressions of X n (t), tel = 
(— T/2, T/2) resembles truncated Karhunen-Loeve representations for m.s. peri- 
odic processes. A 


A.1.2 Spectral Representation 

We have seen in Sect. 3.6.4 that m.s. continuous, weakly stationary random func- 
tions admit spectral representations of the type in (3.25) and (3.30). In contrast 
to Karhunen-Loeve expansions that hold for both weakly stationary and nonsta- 
tionary random functions, spectral representations can only be used for weakly 
stationary random functions. As for Karhunen-Loeve expansions, spectral repre- 
sentations define completely the probability law for Gaussian random functions, but 
provide no information beyond the first two moments for non-Gaussian functions. 

Parametric models based on the spectral representation theorem resemble the 
models used by some Monte Carlo simulation algorithms (Sect. 3.8.1). For example, 
let X(t) be a m.s. continuous, weakly stationary real-valued stochastic process with 
spectral representation 



(A. 4) 


where E[U (v)] = E[V (v)] = 0, E[dU (v)dV(v ') ] = 0 for v ^ v', and E[dU (v) 2 ] = 
E[dV (v) 2 ] = 2dS(v) — 2s(v)dv = g(v)dv for all v, V > 0 (Theorem 3.46). The 
latter equalities hold if S(v) is absolutely continuous so that it admits a spectral 
density x(v) —dS(v)/dv and, consequently, a one-sided spectral density g(v) (3.26). 

Suppose X(t) is as in Theorem 3.18 and define the parametric models 


n 


X n (0 = X! a A A k cos (nt) + B k sin(v^f)], n= 1,2,..., (A.5) 


k= l 


where A k , B k are uncorrelated random variables with mean 0 and variance 1, 
0 < v < oo denotes a cutoff frequency, v k = (k — 1/2) (v/n), er^ = g(v)dv, 
and I k — [v k — v/(2n), v k + v/(2«)] fork =1,2 We have seen that the second 
moment properties of X n (?) converge to those of X(t) as the partition of (0, v) is 
refined, that is, n — > oo, and v — > oo. Moreover, if X(t) is Gaussian, then X n (?) is 
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a stationary Gaussian process for each n and becomes a version of X{t) as n -» oo 
(Theorem 3.46). 

Theorem A.3 The parametric models X„(t) in (A. 5) have mean 0 as X(t). The 
discrepancies between the variances and covariance functions ofX n ( t ) and X(t) are 
Var[A(f)] - Var[X„ (t)] = f°° g(v)dv and 


•OO 


/ 

Jh 


c(r)-c„(r)= V / g(v)[cos(vr) — cos(vr-r)]r/v + 
k=i Jl * 


g(v) cos(vr)dv, (A. 6) 


for v arbitrary, where c(r) = E[X(t)X(t + r)] and c„(r) = E[X n (t)X n {t + r)]. 
Proof The error in the variance of X n ( t ) results from Var[X (t)] = / 0 °° g(v) dv and 
Var[A„ ( t ) = l a k ~ Jo si v )dv. The formula in (A.6) follows from the expres- 

sions c(r) = J 0 °° g(v) cos (vr)dv and c„( r) = X/:= l a k cos (v* r ) of the covariance 
functions of X(t) and X n (t). The first term on the right side of (A.6) vanishes 
as n — > oo so that lim„^ 00 [c(r) — c„( r)] = g(v) cos(vr)dv for a fixed but 
arbitrary v. The discrepancy between c(r) and c„ (r) vanishes if, in addition to mesh 
refining, the cutoff frequency v is extended to infinity. ▲ 

Example A.5 Let X(t ) be a m.s. periodic process with period T > 0, mean 0, covari- 
ance function c(r) = E[X{t+x)X{t)] = XitLoo c k expO'/cvor), and spectral density 
s (v) = -oo c k s ( v ~ kv o), where c k > 0 and vo = 2it/ T . The spectral represen- 
tation of X(r) is X(t) = expfffcvo t), where {Zk \ are random variables with 

E[Zk\ = 0 and E[ZkZ/] = Ck&kl- This representation coincides with the Karhunen- 
Loeve expansion of Xit) in Example A.2 <> 


A.1.3 Sampling Theorem 

We give the classical form of the sampling theorem for deterministic functions, extend 
it to weakly stationary bandlimited stochastic processes, and suggest constructions 
of parametric models based on this theorem. 

Theorem A.4 Let x : R. — >■ R be a continuous function with Fourier transform 
taking non-zero values in a bounded frequency range [— v, v], 0 < v < oo. Then 


OO 


x(t)= ^ x (t 0 + kr )a k (t - t 0 ) 


(A. 7) 


k = — oo 


where 



(A. 8) 


r = 7t/v, and to e R. is an arbitrary constant. 
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The theorem states that a band-limited signal x(t ) is uniquely debited by its values at a 
countable number of arguments spaced equally at x — jx/v, that is, we can reconstruct 
x(t) from its values x(to + kx), k = 0, ±1, ±2, . . at a countable set of times. The 
ratio 1/r = v/rt is the called the Nyquist sampling rate. Note also that a k (it) = 1 for 
u = 0, a k (u) = 0 for u = qr, q e Z\{0}, \ai c (u)\ < 1 for all net, and |of* («) I — >■ 0 
as \u\ -> oo. 

The sampling theorem extends directly to stochastic processes whose spectral 
densities have a bounded support. 

Theorem A.5 Let X(t), t eR, be a real-valued, weakly stationary process defined 
on a probability space (£2, & , P) with mean 0, correlation function r(x) = E[X(t) 
X(t + r )], and one-sided spectral density g(v) with support [0, v], 0 < v < oo. 
If X(t) has a.s. continuous samples, then 

oo 

X(t,co)= ^ X(to + kx, at)a k (t — to) (A. 9) 

k = — oo 


for almost all co eQ. 

Proof Since X{t) can be viewed as a superposition of harmonics with random ampli- 
tudes and frequencies in the range [0, v] (Theorem 3.17), its samples cannot have 
harmonics outside this range. Since the samples X(-, at) of X(t) are also continuous 
function. Theorem A.4 can be applied sample by sample. A 

The representations in (A. 7) and (A. 9) involve values of x(t) and X (t. at) at a 
countably inbnite set of times. We consider two approximation for the representation 
of X{t) that use values of X(t) at a bnite number of times. The brst approximation, 

n 

X n (t)= ^ X(t 0 + kx)uk(t - t 0 ), n = 1,2,..., (A.10) 

k = —n 

is referred to as a global approximation and constitutes a truncated version of (A. 9). 
The second approximation, referred to as a local approximation, is 


n t +n + 1 

X n (t)= ^ X(kx)a k (t), n = 1,2,..., (A. 11) 

k = n t —n 

where n t = [f/r] denotes the largest integer smaller than t/r, n > 1 is an integer, 
and kx, k — n, n, + n + 1, are time arguments, referred to as nodes. The 

process X n (t) depends on the values of X(t) at 2 (n + 1) nodes, centered on the cell 
[n, x, ( n , + l)r] containing the current time t, and is such that X n (kx) = X(kx) for 
k = n t — n, . . . ,n t + n + 1, by the properties of a k ([5], Sect. 5. 3. 1.3). 

Theorem A.6 The sequence of random variables X n (t) converges in m.s. to X(t) as 
n -> oo at each t e K ([6], Sect. 3.7). 
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Proof Simple arguments show that the covariance function E[X n (s)X n (?)] of X„ (?) 
converges to the covariance function Zs[X(,j)A(r)] of X(t) at any s,teR, so that 
X n {t) has the same second-moment properties as X(t) asymptotically as n —> oo. 
If X(t) is a Gaussian process, then X„(t) is Gaussian for each n > 1 so that it becomes 
a version of X(t) as n — > oo. A 

A drawback of the global representation in (A. 10) is that n must increase with ?, 
so that we need a large number of terms to represent X(t) accurately at times ? ^> 0. 
The local approximation in (A. 11) does not have this limitation. 

TheoremA.7 The sequence of processes {X„(t)} has the properties X n {t) X (?) 

at every t and c„ ( s,t) = E[X n ( s)X n (?)] -> c(s , ?) = E[X (s) X(?)] at every (s,t) as 
n->o o ([5], Sect. 5.3.13). 

The representations in (A. 10) and (A. 11) can be constructed simply and can be 
used to approximate both Gaussian and non-Gaussian processes with continuous 
samples [3]. Note also that the processes X n (t) and X n (?) are not stationary for 
n < oo, for example, the covariance function, 

n t +n + 1 / n s +n+\ \ 

c n (s, t) = E\X n (s)X n (?)] = ^ U k (t) l ^ c((k - l)x)ai{s) J , (A.12) 

k = n t —n \l = n s —n J 

of X„(t) in (A.ll) depends on s and ? rather than the time lag |? — ,v|, and the 
coefficients X (let) in the representation X n (?) given by (A. 1 1 ) are correlated random 
variables. 


A.2 Quantizers 

Let A be a random element defined on a probability space (!T, P) with values 
in a metric space S with metric p, that is, X~ 1 (.'X) c ■'X , where -9 J denotes the 
Borel CT-field generated by the topology on S induced by metric p. Quantizers are 
simple random elements defined on (Q, JP, P) that provide approximations for X 
by assigning to each point in X(£2) C S a reproduction selected from a finite subset 
of S. If the cardinality of this subset is in. we deal with an w-quantizer. 

Definition A.2 Let m > 1 be an integer and 

Y m —{li:S-^- S, Borel measurable and \h\ < m], (A.13) 

where \h\ denotes the cardinality of the quantization rule S i->- h(S). Any random 
element h(X), h e "P m , is an ;n-quantizer for X. We are interested in an optimal 777- 
quantizer, that is, an / 77 -quantizer that minimizes the discrepancy between X and h{X) 
in some sense. 

Most studies on quantizers are for R^-valued random variables X [7, 8], so 
that S = y = SS(R d ). The discrepancy between X and an ///-quantizer h{X), 
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h e y m , is usually measured by £[1121 — //(X)|| r ] provided X e L r (Q, JF, P), where 
|| ■ || is || jc || = ( j | Xj |' ) if 1 < r < oo and ||x|| = maxi<,-<d |x; | if r = oo, 
x eR 1 '. 

The error of a quantizer h e Y m with image h(X(Q)) = {a® 
can be calculated from 


E[\\X - h(X)\\ r ]= £ 
k= 1 


\x-a m \\ r dF(x), 


(A. 14) 


where Ak—h '(a®) e &(R d ), k = 1 , ,m, and F denotes the distribution of X. 
The elements of h (X (£2)), that is, the points a® e M . d , are referred to as centers. The 
optimal m - quantizer is the member of "V m that has the smallest error, referred to as 
the minimal quantization error, so that its error is given by 

VmAX)= inf E[\\X - h{X)W r \ (A. 15) 

heT m 


Example A. 6 Let X denote a real- valued random variable and let { A& , k = 1 , . . . , m } 
be intervals partitioning the image X(f2) of X. The corresponding centers {a ® } of 
the optimal quantizer for r = 2 are a® = f A Xcl P / P(Ak) since they minimize the 
error £[||X - h(X)f] = i [J Ak X 2 dP - 2a® f Ak XdP + (a®) 2 f Ak dP\ 
If X is uniformly distributed on a bounded interval, then {a®} are the midpoints of 
the intervals {A&}, so that {a®} and {A*} define a classical Voronoi tessellation. O 

The following theorem shows that the quantization problem is equivalent to the 
/n-center problem, whose objective is to find a set a C R 0 ' of m elements in 
such that £[ min a € a || X — a||' ] is minimized. The theorem shows that the optimal 
quantizer constructed on ///-centers {a®} is h — Xa"= l 1 A k . where { A* } is the 
Voronoi tessellation with centers {a®}. The quantizer is optimal in the sense that its 
error is v m , r (X). An extensive and useful discussion on properties of quantizers can 
be found in [7] (Sect. 1.4). 

Theorem A.8 The m- quantization error v mr (X) in (A.15) can be calculated from 
v m ,r(X) = inf E T min || X — a||' 1. (A. 16) 

o'CM d ,|o;|<m 

Proof First note that Y a (a>) = min flSQ! ||A((y) — a|| r , that is, the smallest of the 
distances ||X(a>) — a\\ r between a sample X(co) of X and the points aea, is a 
sample of a positive random variable Y a depending on a. The solution of the ///-center 
problem is a set a 0 pt with the property £[L, upt ] < E[Y a ] for all a C M d , |a| < m. 

Since h e 'F m arbitrary defines both a partition {A/-) of X (Q) and a set of centers 
{a®} for this partition, we have £[||X — /z(X)|| r ] = l f ,\ k II* ~ I \ r dF(x). 
The inequality f Ak ||x — a®|| r d£(x) > f Ak min /,(k) eAk ||x — fe® \\ r dF(x) implies 
£[11 X — /z (A^) || r ] > £[min/, ea ||Z — b || r ]. Consider now a Voronoi tessellation 
{A*} for X(£2) with centers {a®}. The function hy = X ™=1 a ^^A k induced by 
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this tessellation is a member of "V m and E [ min aea || A — a\\' ] — £[11 A — h y (A)||' ’]. 
This observation and the previous inequality show that £[min aSQ , || A — a || r ] can 
be bounded by values of £[|| X — /?(A)|| r ] for some h e Y m , so that (A. 16) holds. A 

Quantizers are related to conditional expectations. Let Abe a random variable with 

finite variance defined on a probability space (f2, J?, P). Let { Lb, k = I m\ be a 

measurable partition of Q and set Sf = a (£2j , . . . , Q,„). The conditional expectation 
E[X | Sf] is f^-measurable, has the smallest m.s. error relative to X, that is, P[(X — 
E[X | f#]) 2 ] < E[{X — T) 2 ] for all f^-measurable Y with finite variance, and has the 
expression E[X \ — Ylk= t E l x I with E[X | XdP / P(£2k)- 

Since the simple random variable E[X | ;#] is an optimal m.s. approximation for A, 
it is an m-quantizer for r — 2. 

Example A.7 Let A ~ t/(0, 1) and let A k = [(A: — 1 )/m, k/m), k= 1, • • • , m, be 
a partition of A(f2) = [0, 1]. The optimal m-quantizer corresponding to this parti- 
tion is / z(jc) = i l( x e Ak), where a® = J A ^xdx/ f A dx = (k—l/2)/m 
(Example A. 6), that is, a simple random variable with values (k — 1/2 )/m of 
probabilities 1/m. The conditional expectation of A with respect to the a -field 
cf = (r(n k = A ~\A k ), k = 1, . . . , m) is (Sect. 2.1 1) 



a simple random variable with the same probability law as h(X). O 

Since quantizers can be conditional expectations, they are likely to underestimate 
the uncertainty in A, as indicated by the following result. Accordingly, reliability 
estimates based on quantizers may be overoptimistic. 

Theorem A.9 Let X be a real-valued random variable with finite variance that is 
defined on a probability space (£ 2 , ffi , P), and let ^ — cr(fi k , k = 1, . . . , m) be a 
sub-o -field of & corresponding to a measurable partition k = 1 , . . . , m] of £2. 
Then 



(A. 17) 


Proof We have 0 < £[( A — A) 2 ] = £[A 2 ] + £[A 2 ] — 2£[AA] with the notation 
A = E[ X | ( S'\. Since A is the best m.s. estimator for A, E[{ X — A)Z] = 0 for any 
Sf -measurable random variable Z (Theorem 2.13). For the special case Z = A, we 
have £[AA] = £[A 2 ] so that 0 < £[( A - A) 2 ] = £[A 2 ] - £[A 2 ]. A 

The construction of quantizers for random vectors can be extended to random func- 
tions provided these functions are approximated by parametric models. For example, 
let A (t) be a real-valued Gaussian process defined on a bounded time interval / cl 
and let X m (t) — t Zk<Pk(t), tel, be a truncated Karhunen-Loeve expan- 
sion of this process, where { j and {(pk(t)} are the eigenvalues and the eigenfunctions 
of the correlation function of A(f) and { Z k ] are independent N ( 0 . 1 ) variables. The 
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parametric model X m ( t ) approximating X(t) depends on an 771 -dimensional vector 
Z = (Z 1 , . . . , Z m ) that can be described approximately by quantizers of the type 
considered in this section [9, 10]. 

In summary, quantizers h(X) for a target random vector X are simple R r/ - valued 
random variables defined on the same probability space as X via measurable mappings 
h : -* R d , the construction of quantizers involves moments of X up to an order 

depending on the definition of the error in (A. 1 4), quantization and /77-center problems 
are equivalent, and quantizers are closely related to conditional expectations for r — 2. 


A.3 Stochastic Reduced Order Models (SROMs) 

There are notable similarities and differences between quantizers and stochastic 
reduced order models (SROMs). Both quantizers and SROMs are simple random 
elements that represent target random elements in some optimal sense. However, 
distinct optimality criteria are used to build quantizers and SROMs. Quantizers are 
defined on the same probability space as the random elements they approximate, 
while SROMs do not have to have this property. The construction of quantizers for 
random functions requires to develop parametric models for these functions, while 
the construction of SROMs does not have to use parametric models. 

Definition A.3 Let A be a random element defined on a probability space ( £2 , & , P ) 
with values in a metric space S and let ,Z be a Borel cr -field generated by the open 
balls in S. A SROM X for A is a simple random element with values in S whose 
properties are similar to those of A in the sense defined by the objective function 
in (A. 18). If the cardinality of the range of A is m, we say that A has size m. The 
samples (ic®, .... x (m) ) of a SROM A and their probabilities (pi , . . . , p m ) define 
completely A. 

As for quantizers, the size m of SROMs is assumed to be specified since it is 
primarily determined by the admissible computational effort. The defining parame- 
ters {(it®, pk ), k = 1 777 } of an 777-dimensional SROM A for a random element 

A are the solution of the optimization problem 


min |e(p)}, with constraints 


(A. 18) 


m 



k = 1 


where e(p) — X»>i a u e u ( P) measures differences between moments, correlation, 
distributions, and other properties of A and A, a u > 0 are weighting factors, and 
p — (pi , . . . , p m ) denotes a probability vector. Note that the samples {x®} of A are 
numbers, vectors, or functions of time and/or space if A is a random variable, vector, 
or random function, respectively. 
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A two-step suboptimal algorithm can be used to construct SROMs. First, generate 
n se t > 1 sets of m independent samples of X, referred to as candidate samples 
for X , and solve the optimization problem in (A. 18) to find the optimal proba- 
bility vector p (l> and the corresponding error e( p ll> ) for each set i = 1, . . . , n set of 
m independent samples of X. Second, select for the defining parameters of X the 
set i o of m independent samples and the probability vector p <l()> with the property 
e(ph °))= mini<,<„ set \e ( //' , )}. Alternative algorithms for constructing SROMs 
can be found in [11]. 

SROMs are not quantizers even for random variables and vectors. Let h be a 
member of "V m in (A. 13) with centers {a®}, k= 1, . . . , m, that is, h constitutes a 
quantizer for a random variable/vector X defined on a probability space (Q, .9- . P ). 
The measurable mapping/? generates a measurable partition [Ofc = X~ l (/?^* (a ^’ ■*))}, 

k = 1 m, of £2 so that { P(Qk) 1 are the probabilities of {a®}. Consider a SROM 

whose samples are the centers of the quantizer of X. Generally, the probabilities 
{pk} of k= 1, . . . , /??, differ from {P(£lk)} since they are the output of an 

optimization algorithm with objective function involving properties of X that differ 
from that in (A. 14). Different probabilities {pk\ result for distinct definitions of the 
objective function e{p ) in (A. 18). 

Example A.8 Let X be a SROM with size in for an Revalued random variable 
X. Errors e u (p) in (A. 18) related to distributions, moments, correlations can be 
defined by 

ei ip) = [ ( F{x - F(x))) 2 w F (x)dx, 

Js. d 

<?2 (p) = ^ (m(h, ■ • • , rd) - A(n, • • ■ , . .. , r d ), 

0<ri+-+r d <q,ri,...,r d >0 

e-i (p)= ^ w r (i, j)(r(i, j) — r(i, j)) 2 , (A.19) 

i,j=\....,d;j>i 

where Wf,w M ,iv r > 0, are weighting functions and q > 1 is an integer. The distri- 
butions and the moments of X have the expressions 


Fix) = / J (nf =1 Xj < Xi) = Y, Pk 1( n? = , xf k) < Xi), 

k= 1 

- d -i m / d v 

Pin , • ■ ■ , r d ) = E PI X r / = Y Pk ( n ix k) ) n j , 

-i = l k= 1 ^/ = 1 ' 

m 

f = E[xx'] = Y PkX ik) ix^y, 


(A. 20) 


where x = ix i , . . . , Xd) e and x® = (x} A) , . . . , x^) e 

Example A. 9 Suppose X is a random variable with distribution F and moments 
p^iq) = E[X q ] = f x q clF(x). The stochastic reduced order model X for X is a 
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Fig.A.l Estimates of the 
first six moments of a 
Gamma random variable 
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simple random variable taking the values x (k> with probabilities pk, k=\, . . . ,m, 
so that its distribution and moments are F(x ) = X™ _ ] pk 1 (x {k> < x) and fi(r) — 
l Pk(x (ki ) r ■ respectively. The probability vector p = (pi , . . . , p m ) for a speci- 
fied range (x-\ . . . , ic (l " 1 ) of X minimizes the objective function 



ei 


under the constraints pk > 0, k = 1, . . . , m, and ( pk = I (Example A. 8). 
Alternative objective functions can be constructed based on entropy or other metrics. 

The heavy dotted line in Fig. A. 1 shows the exact first six moments of X assumed to 
be a Gamma random variable with density f(x) —x r ~ l r] r exp(— rjx)/T(r), x > 0, 
and parameters r — 2 and rj — 3. The approximate first 6 moments of X delivered 
by a SROM X of X with m — 20 obtained by a suboptimal algorithm with objective 
function in (A. 21) with ai — u .2 = 1 and n se t = 50 are in error by less than 1%, and 
are indistinguishable from those of X at the figure scale. In contrast, Monte Carlo 
estimates based on samples of the same length as that of X can be inaccurate. The 
thin lines in Fig. A. 1 are estimates of the first six moments of X corresponding to 
100 sets of independent samples of X of length 20 each. Monte Carlo estimates of 
the first six moments of X match the accuracy of the corresponding moments of X 
with m = 20 if based on approximately 50,000 independent samples of X. O 

Example A. 10 We have seen that quantizers tend to underestimate uncertainty, which 
results in, for example, underestimation of higher order moments. Fet X be an expo- 
nential variable with unit mean. The following table gives the exact first eight 
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Order Exact 

Quantizer SROM 

1 

1 

1 

0.9994 

2 

2 

1.9693 

2.0017 

3 

6 

5.7476 

6.0004 

4 

24 

21.64 

24.03 

5 

120 

96.69 

119.49 

6 

720 

484.74 

712.24 

7 

5040 

2622 

5083 

8 

40320 14913 

40098 


moments of X and approximations of these moments based on a quantizer corre- 
sponding to the Voronoi tessellation I\ = (0, (xi+X2)/2), h — ((x&_i+Xyfc)/2, (x^-t- 
x*_|_i)/2), and 7g = ((X 7 + x%)/2, oo) with centers {x,} at 0.1753, 0.5725, 1.0305, 
1.5712, 2.2313, 3.0792, 4.2665, and 6.2665 and on a SROM with size m = 8 corre- 
sponding to an objective function as in the previous example. The quantizer under- 
estimates the moments of X and the error increases with the moment order. O 

Theorem A.10 The accuracy of SROM s increases with the number n set of candi- 
date samples and with model size under some conditions. If a random element X is 
characterized by n )$> m independent samples, the samples ofSROMs X with size m 
become equally likely as model size increases, that is, pk — > 1 /n as m —r n. 

Proof The performance of SROMs increases with n set under the suboptimal algo- 
rithm provided the sets of samples used in «' et > « set trials include those used in rc se t 
trials. Under full optimization, the accuracy of SROMs increases with m since the 
optimization space extends as m increases. 

For simplicity we prove the last property for a real-valued random variable X, so 
that e{p) is given by (A.21), and has two components, e\ ip) = f (F(x) — F(x)) 2 dx 
and ej ip) = = l — P-(r)) 2 . Let {x^\ i = 1, . . . , n] be n independent, 

equally likely samples of X assumed to be sufficient to characterize X. The best SROM 
irrespective of size has the range {x^ l \ i = 1 , . . . , n] and a probability vector p with 
coordinates equal to 1 / n, since e\ (p) = ezi p) — e ( p ) = 0 for this selection. Hence, 
the probabilities {pk} for a SROM X with size m < n delivered by the optimization 
algorithm in (A. 18) approach 1 / n as m —*■ n. A 

Generally, the discrepancy between properties of a random variable X and a SROM 
X can be viewed as consisting of two components related to the sample size n and 
the model size m. For example, the difference between the distributions Fix) and 
Fix) of X and X can be bounded by 

(F(x) - F(x)) 2 < 2[(Fix) - Fix)) 2 + (Fix) - F(x)) 2 ], 

where Fix) denotes the empirical distribution of X obtained from n independent 
samples [x (l> , i = 1, . . . , n} of X. The term (Fix) — Fix))" is caused by statistical 
uncertainty, and can be made as small as desired by increasing the sample size since 
Fix) -* F(x) in probability as n oc at the continuity points of F. The term 
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( F(x ) — F(x )) 2 relates to the accuracy of X , and can be reduced by increasing the 
model size. 

Theorem A. 11 SROMs X can be interpreted as Borel measurable mappings of 
random elements X ifX can be represented satisfactorily by n of independent samples. 

Proof Let (x ® , . . . , x® ) be n independent samples of X. The construction of a 
Borel measurable mapping h : S — > S of the type in (A. 13) is equivalent to that of 

a partition {^, k = 1 m} of (x®, . . . , x®) such that all x ^ in % are mapped 

intox® e S and pk — P(A _1 (/! _1 ({x®}))) ~ n^/n for all k= 1, . . . , m, where n k 
is the cardinality of 'd/ . The partition % of (x® , . . . , x (n 1 ) can be constructed in two 
steps. First, construct a preliminary partition "f#7 of (x® , . . . , x®) by, for example, 
assigning a sample x ® to % if it is closer to x® than any other x (/ \ l ^ k, that is, 
x ® is assigned to ffk if p(x® , x®) < p(x® , x^), l ^ k, where p is a metric in 
S. If p(x®, x®) = p(x®, it®) for a pair ( k , /), x ® is assigned to either ^ or 
Generally, the cardinalities n' k of do not satisfy the condition pk — n' k /n. Second, 
eliminate the members of the clusters with n’ k /n > pk that are the farthest from 
the centers x® until the modified versions of these clusters satisfy the condition 
n'l/n — pk, where n'l denotes the cardinality of . The members extracted from 
the clusters ^ with n' k /n > pk are assigned to the clusters with n' k /n < pk 
depending on their closeness to the centers x® of these clusters and the requirement 
n'l/n — pk - The algorithm delivers a partition {^, k = 1, . . . , m} of (x® , . . . , x®) 
such that the members x® of % are mapped into x {k> and the probability that X 
takes values in % is nk/n ~ pk . ▲ 

We conclude with the observation that the algorithms for constructing SROMs for 
random vectors extend directly to random functions. The only difference is the space 
in which calculations are performed. For example, a SROM X(t) with size m for a 
real-valued stochastic process X(t) consists of m samples (xi(f), . . . , x m (t)) of X(t) 
and their probabilities (pi, . . . , p m ). Properties of X (?) can be obtained by elemen- 
tary calculations. For example, the expectation of a functional maxo<,< T { ATf)} of a 
SROM X(t) is £[max 0 < f < t {A(t)}]= Ylk=i P*( ma x 0 < t < T {xi(f)}). 

Example A.ll Consider the translation process X(t) = F~ l o <I>(G(t)), t e [0, r], 
where F is a Gamma distribution with shift, shape, and decay parameters a = 1 , k — 2, 
and X = 3, G ( t ) is a stationary Gaussian process with mean 0 and covariance func- 
tion E\G(t + h)G(t)] = exp ( — |/z|)), and r = 10. A SROM model X(t ) with size 
m — 20 has been constructed using an optimization algorithm with objective func- 
tion given by (A. 18) and (A. 19) with = ct 2 = 013 — 1, wf — 1, = 1 / /x (-) , 

w r (0 =1 //■(•), and q — 6. The left panel in Fig. A. 2 shows marginal distributions of 
a SROM at 40 equally spaced times in [0, r]. The dotted and solid lines in the right 
panel are the exact and the temporal average of the marginal distributions in the left 
panel. The differences between temporal averages of first six moments of X and first 
six moments of A are 0.48%, 1.75%, 3.24%, 3.87%, 2.32%, and —2.44 %. O 
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Fig. A.2 Marginal distributions of X and X 
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The SROM X(t) is not stationary although X(t) is a stationary process. This 
unfavorable feature of SROMs is shared by all estimators for properties of stationary 
processes, and can be ameliorated by allowing the samples {x k (t)} of X (t) to start 
at random times. Let Y (t) be a simple stochastic process with samples {x k (t + 7j)} 
of probabilities {p k }, where {7*} are independent U( 0, 7) variables with 1 < r, so 
that all samples are on in the time interval [7, r — 7]. During this time interval, the 
moment of order r of Y ( t ) is 


i m 

E[Y(t) r ] = E{E[Y(t) r | 7i, ..., r m ]} = £ £ p* (£*(* + 7*)) r 

^ k=\ 

m m j r.j 

= 'S' Pk E\(xk(t + T k )) r ] = y Pk- I (x k (t + u)) r du, 

tz \ *= i 1 Jo 

and constitutes a local average of the instantaneous moment Pk(xk(t)Y °f 

order r of X ( 1 ) during [0, 7]. The temporal average in the figure is over the entire 
time interval [0, r]. O 


A.4 Extended Stochastic Reduced Order Models (ESROMs) 

We have seen that SROMs provide efficient approximations for random elements. 
These models can also be used to solve stochastic equations, that is, equations with 
random entries . Let X be a random element defined on a probability space ( £2, ^ , P) 
that collects the random entries of a stochastic equation. The solution U of this equa- 
tion is also a random element on (ST, & , P) if the mapping X 7/ is measurable. 
Let X be a SROM for X with parameters {x k , p k }, k = 1, . . . , m, and denote by {u k } 


470 


Appendix A 


the solutions of this equation with X replaced by {xk } . Properties of U can be approx- 
imated by properties of its SROM U with parameters {uk- pk), k = 1 , ,m. The 
determination of moments, distributions, and other properties of U involves elemen- 
tary calculations. 

The performance of SROM-based solutions for stochastic equations depends on 
the accuracy of the SROM X and of the representation of mapping X i-o* U. The 
accuracy of X can be improved to a desired level by, for example, increasing the model 
size and/or modifying the objective function used in optimization. The mapping 
X i — ^ JJ is described by the piecewise constant function 


m 



(A.22) 


k= 1 


where {T* } is a measurable partition of the range T = X (£2) of X such that pk = 
P(X~ l (TO), k = 1, . . . , in. This representation of mapping X \-x U is attractively 
simple but may not be sufficiently accurate. 

Extended stochastic reduced order models (ESROMs) are similar to SROMs, but 
use piecewise linear functions, rather than piecewise constant functions as in (A.22), 
to approximate the mapping X -+ U. The supports of the local linear approximations 
are the cells of Voronoi tessellations centered on {!/,}. The implementation of the 
ESROM-based method for solving stochastic equations involves the following three 
steps that, for simplicity, are presented for the case in which X is an Revalued 
random variable and [/(•) is a real-valued random function. 

- Step 1: ESROM X for X. Let \xk), k — 1 , . ... in, he samples of X, and denote by 

Ft = {r 6 T : ||x — ijfcll < ||x — x/||, l k], k = 1, . . . , m, (A.23) 

the Voronoi tessellation with centers {x&} in the range T = X (Q) of X. Any 
set of samples {xk\ and probabilities {pk = P(X~ l (T *))} defines an ESROM 
X for X. We are interested in the optimal pair {xk , IT } . k = \ , .... m, that is, 
the pair that minimizes the discrepancy between the probability laws of X and X. 
A suboptimal pair can be obtained by extracting subsets { x \ } of size m from a set of 
independent samples {x,}, i = 1 , . . . , n , n m , of X at random; partitioning {x, } 
in subsets {F*} of the Voronoi tessellation in Y given by (A.23); and calculating 
the discrepancy between properties of X and X with samples {xk } and probabilities 
{ pk — rik/n }, where rik gives the number of members of {x, } in IT - The ESROM 
X corresponds to the subset {x/,} of {x, } with the smallest discrepancy. 

- Step 2: Approximate solutions. Calculate the deterministic solutions \iik) corre- 
sponding to {A = Xk} and the gradients Xtik(-) = (3m&(-)/3xi, . . . , duk(-)/dxd), 
k = \ , .... in, of these solutions with respect to the coordinates of X. The gradients 
{Xuk} can be interpreted as sensitivity factors with respect to the coordinates of 
X. The deterministic solutions {uk} and \ Xuk) can be used to construct the local, 
piecewise linear approximation 


m 


U L (-) = X [«*(■) + - **)] l ( x e r -t) 


(A.24) 
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for mapping X i->- U, under the assumption that the mapping is sufficiently 
smooth. The representation in (A. 24) approximates U(-) in each cell T* by a 
hyperplane tangent to X i->- U at (x k ,Uk),k = l, ... ,m. Piecewise quadratic 
approximations can be constructed in a similar manner, but require derivatives of 
order two of U with respect to the coordinates of X. 

- Step 3: Solution properties. The properties of Ul(-) in (A. 24) depend on the 
probability law of X, the samples of X that define the Voronoi tessellation {IT-}, 
and features of mapping X \ —>■ U. The construction of the tessellation {T^} in 
high dimension is a difficult task. However, it is possible to estimate properties of 
U L efficiently by Monte Carlo simulation without constructing the sets { IT-) - Let 
\xi |, i = 1, ...,n, ben m independent samples of A. The members of (x, } that 
are in have the property || x/ || < ||x,- — Jc/||,Z ^ k. An algorithm can be used 
to identify the subsets of {x, } that belong to the cells of the Voronoi tessellation in 
r = X (Q). Denote the cardinality of these subsets by n k . Once these subsets have 
been identified, properties of Ul can be estimated simply. For example, moments 
of order q > 1 and marginal distributions of Ul can be estimated by 


E[U L ( -) q ] - X — [— H (“ * (0 + V5*( •) • (xi - x k )) ql \ and 
k= i n \- nk r „ er ,. J 


k= i L * r k 



Ft(u, ■) = P(U L (-) < u. 


from independent samples {x;} of X efficiently since the functional form of Ul is 
known. Similar estimates can be constructed for other properties of Ul- 

If A is a random function X (V) of time and/or space, it has to be represented by 
a parametric model, that is, a deterministic function Xp(s, Z ) of time and/or space 
depending on a random vector Z (Sect. A.l). The stochastic dimension of Xp(s, Z) 
is finite, in contrast to that of X(s) that, generally, is not. The previous steps can be 
applied to solve stochastic equations depending on X ( s ) by replacing this random 
function with Xp (s, Z). 

Example A. 12 Let U(t) be the state of the dynamic system 


U(t ) = a U(t ) + 0 U(t) 3 + U(t) X(t), t > 0, 


(A.26) 


where X (?) = Xf=i VjiO i s a parametric input process, Z = (Z\, , Zj) 
denotes an Revalued random variable, {i pj (?)} are specified deterministic functions, 
and a, ft are real constants. Let {z^} be the samples of an ESROM model Z for Z and 
{ } the cells of a Voronoi tessellation centered on the samples of Z . The deterministic 
solutions u k {t) and the coordinates u>k,j(t) = duk{t)/dzj of the gradients of these 
solutions can be calculated from 
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t 


Fig.A.3 Deterministic solutions Uk(t) 




Fig.A.4 Coordinates of gradient Vih-(t) = {u>k,j} with respect to Z\ (left panel) and Z2 (right 
panel) 


d 

iik{t) = a iik(t) + ! 8 u k (t) 3 + u k (t ) ^ h,j and 
/= I 

m,j(t) = f a + 3jS U k (t) 2 + y Zkj <Pj(t ) j Wk,j(t) + iik(t)(Pj(t), j = 1, . . . ,d. 

V j= 1 ' 

(A.27) 

The piecewise linear approximation (7/ (t) of U (t) is given by (A.24). Following 
numerical results are for d = 2, (Z \ , Z2) = translation Beta random variables with 
range [1,4], shape parameters (p, q) = (1, 3), and correlation coefficient p — 0.2 
between their Gaussian images, a = 1 , ft = — 1 , initial state U (0) = 1 , time interval 
[_ 0 , 10] , = cos(v r) , ip 2 (t) = sin(vf), and v = 1 . The samples {ik } of a SROM 

Z with m — 10 are 


Appendix A 


473 


Fig.A.5 Monte Carlo 
estimate of E[U{tfy (heavy 
solid line) and SROM-based 
approximation of this 
moment (heavy dotted line) 



t 


Fig.A.6 Monte Carlo 
estimate (thin solid line), 
SROM-based approximation 
(thin solid piecewise 
constant line), and 
ESROM-based 
approximation (heavy 
dashed line) for the marginal 
distribution 

F(u; t) = P(U(t) < u) 



Zk , l = 1-36; 1.39; 1.37; 2.77; 1.58; 2.30; 1.19; 1.22; 2.05; 1.71 
z k , 2 = 1-33; 1.00; 2.81; 1.25; 1.01; 2.55; 2.29; 1.53; 1.37; 2.03. 

The probabilities of these samples are {pk} = 0.0001; 0.0001; 0.2075; 0.2435; 
0.0882; 0.0598; 0.0713; 0.0967; 0.1403; 0.0924. 

FigureA.3 shows the deterministic solutions Uk(t) corresponding to Z = z.k, 
k = 1 , ,m, that is, the samples of U (f). The sensitivity factors u>kj, j = 1,2, 
are shown in Fig. A. 4. The heavy solid line in Fig. A. 5 is a Monte Carlo estimate 
of E[U (f) 6 ]. The dashed line is an approximation of this moment delivered by the 
SROM of U (f) corresponding to Z. The accuracy of the approximation is remarkable 
given that it is obtained from only m = 10 deterministic solutions. The ESROM- 
based solution in (A. 25) is indistinguishable from the Monte Carlo estimate of 
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E[U (f) 6 ] at the figure scale. For this system, both SROM-based methods provide 
accurate approximations for E[U (r) 6 ]. 

The thin solid and heavy dashed lines in Fig. A. 6 are a Monte Carlo estimate and an 
ESROM-based approximation for the marginal distribution F(u; t) — P(U (?) < x ) 
of U (t) at time t = 1. The piecewise constant thin solid line in the figure is an 
approximation of F(u m ,t),t = 1, delivered by the SROM method. The ESROM- 
based approximation for F(u; t ) is a significant improvement over that based on 
the SROM method. The accurate description of the mapping Z i->- U, piecewise 
linear rather than piecewise constant representation, is the reason for the superior 
performance of the ESROM method. <> 


Appendix B 

A Primer on Functional Analysis 


B.l Metric Spaces 


Definition B.l Let M be a set and define a function d : M x M — > R such that 

d(x, y) > 0, d(x, y) = 0 x = y (positive definite) 
d(x, y) =d(y , x) (symmetry) 

d(x, y) < d(x, z) + d{z, y) (triangle inequality) (B.l) 

for all x, y, z € M. The function d with the properties in (B. 1) is said to be a metric or 
distance on M, and the pair (M,d) is called a metric space. A set M can be equipped 
with two or more metrics. 

Example B. 1 Let M be an arbitrary nonempty set. Then 

d{x, y) = 8{x — y), x,y&M, (B.2) 

is a metric on M and (M,d) is a metric space. O 
Example B.2 Let M = and define 

/ q \ 1/2 

d{x,y) = i y"(xj ~ >’/ ) 2 1 

' i = 1 ' 

d(x,y)= max |x/ — v;|, (B.3) 

l <i<q 


where (xi, . . . , x q ) and (v i ..... y q ) denote the coordinates of x and y in RL It is 
visible that these functions have the first two properties in (B.l). Since max i <;<q \xj — 
yi | < maxi< ; < 9 (| x; -Zi\ + \ Zi ~yi |) < maxi<,-< 9 \x t - Zi \ + maxi<;< 9 | Zi - yt \ , the 
second function d satisfies the triangle inequality, so that it is a metric on W 1 . That 
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the first function in (B.3) satisfies the triangle inequality follows from the Cauchy- 
Schwarz inequality given by (B. 17), so that M = K 9 with the first distance in (B.3) is 
a metric space, called the ^-dimensional Euclidean space. The distance cl is referred 
to as the Euclidean metric on W 1 . O 

Example B.3 Let M consist of infinite sequences of reals x = (x i , x?, . . .) such that 
ZZ i x? < °o. Then 


d(x, y ) = 


Z^ - y *) 2 


1/2 


and 


d(x, y) = 


1 \xi-yi\ 

2' 1 + \ Xi - v/| 


(B.4) 


are metrics on M. <> 

Proof Consider the first function d(x,y) in (B.4). That d(x,y) satisfies the first two 
properties in (B.l) follows from its definition. For the third property, note that 


/ " \l/2 / « \ 1/2 / » \ 1/2 

(Z<*-*> 2 ) <(z^ 2 ) + (Z* 2 ) 

X ;■ = 1 ' V i = 1 ' X ; = 1 ' 


holds for any n > 1 since X"= i ( x i ~ y/) 2 < l + Z"= i y 2 + 2 Z"= l 1-^^ I 
and ZZ i I x i yi I — ( z?= i x ?) 1 ( Z/'= l y 2 ) 1 ~ by Cauchy-Schwarz inequality, 
so that Z T= l ( x i ~ y>) 2 < 00 ■ The above inequality with x,- — Zi and z, — yi in place 
of Xj and y,- , respectively, gives d (x, y) < d{x . z) + d(z, >’) by letting n — >■ oo, so that 
c/ is a metric. That the second definition in (B.4) is a metric is left as an exercise. A 

Example B.4 Let M — C and set 


d(x, y) = \x — y\, x,ye C. 


(B.5) 


This function has the first two properties in (B.l) by its definition. Since \a + b\ < 
\a\ + \b\ holds for both real and complex numbers a and b. the function d satisfies 
the triangle inequality, so that it is a metric. O 

Example B.5 Let M — C[ 0, 1] denote the set of real- valued continuous functions 
defined on [0,1]. The function. 


d(x,y)= sup{|x(f) -y(t)\ : fe[0, 1]}, x,veC[0, 1], (B.6) 

is a metric, so that the pair (C[0. 1], d) is a metric space. We have d (x , v) > 0 and 
d(x , y) —d{y, x) by definition. The function d is strictly positive since d(x, y) = 0 
holds if and only if |x(f) — y(?)| = 0 for every t e [0, 1], that is, x(t) = y(t) for every 
t e [0, 1] implyingx = y. The triangle inequality follows from | x(t ) — y(t ) | < |x(f) — 
z(t)\ + \z(t)— y(t)\ < d{x, z)+d(z, y), which shows that d(x, z)+d(z, y)is an upper 
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bound for (|x(f) — y(t) |, t e [0.1]}. The set [|x(f) — y (f) | , t e [0.1]} is bounded since 
continuous functions on bounded intervals are bounded ([12], Theorem 16.1). Since 
the supremum of a set is its smallest upper bound, we have d (x , y) = sup(|v(f) — 
y(f)|, t e [0, 1]} < d(x, z ) + d(z, y), so that d satisfies the triangle inequality. The 
function d in (B.6) is called the sup-metric. O 

We conclude with the observation that the collections of real-valued continuous 
functions defined on a compact K and of bounded real-valued functions defined on 
a non-empty set with the sup-metric are metric spaces. Alternative metrics on these 
spaces can be constructed simply. For example, d(,x , y) = f K \x(t) — y(t)\dt is also 
a metric on C[0, 1] ([13], Sect. 2.2, [14], Sect. 1.1). 


B.1.1 Topology Generated by a Metric 

Definition B.2 Consider an arbitrary non-empty set A. A collection 7 of subsets 
of A is a topology on A if it (1) contains the empty set 0 and the entire space A, (2) is 
closed to finite intersections, that is, A, B e 7 implies A (T B e 7 , and (3) is closed 
to uncountable unions, that is, A, e 7 , i e /, implies U,- e / A,- e 7 , where I is an 
arbitrary index set. 

Definition B.3 Let (M,d) be a metric space. The open ball or the open sphere with 
center x e M and radius r > 0 is the set 

B(x,r) = {yeM : d(x,y) < r}. (B.7) 

Definition B.4 A subset A of M is open if for every x e A there exists r > 0 such that 
B(x, r) is contained in A. The complement of an open set is a closed set. A subset 
of M containing a point x e M is a neighborhood for this point if it contains an open 
ball centered at x. 

Example B.6 The ball B(x, r) in (B.7) is an open set since for every y e B(x, r) we 
have B{y, r') C B(x, r ) for r' < r — d(x, y). <> 

Example B.7 The ball B(x,r) in (B.7) with M — R 2 is a disc of radius r > 0 and a 
square with sides 2 r centered at x e R 2 for the first metric and the second metric in 
(B.3), respectively. O 

Theorem B.l The collection of open sets constructed on a metric space (M,d) using 
open balls defines a topology 77 on this space, referred to as the topology induced 
by metric d. 

Proof Elementary arguments show that intersections of two open sets and unions 
of open sets are open sets ([14], Sect. 1.2, [15], Sect. III. 2). Since M and 0 are both 
open and closed sets, the pair (M, :7) is a topological space. A 

Definition B.5 A Hausdorff, separable, or Ti space is a topological space for which 
distinct points have disjoint neighborhoods. 
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Theorem B.2 A metric space (M,d) is a Hausdorff space. 

Proof Let x, y e M, x f y, and select r' , r" > 0 such that r' + r" < d(x, y). Then 
B(x, r) and B(y, r") represent two disjoint neighborhoods of x and y, so that M 
with the topology induced by metric d is a Hausdorff space. A 

Definition B.6 Consider two topological spaces (X, AX) and (X' , AX'). A func- 
tion / : X — > X' is said to be continuous at x e X if for any neighborhood if 
of f(x) e X' there exists a neighborhood D ofrel such that /( D) C D' or, equiv- 
alently, is a neighborhood of x e X for any neighborhood D' of fix). Iff 

is continuous at every x e X, then/is said to be continuous. 

Definition B.7 If M and M' are metric spaces with distances d and d! , respectively, 
/ : M -* M' is continuous at x e M if for any open ball B(f(x), s), s > 0, in M' 
there exists an open ball B{x, S), S > 0, in M such that f(B(x, S)) C B(/(x), e) or, 
equivalently, given s > 0 there exists S > 0 such that f(y) e B(f(x), e) whenever 
y e B(x , 5). 

Note that Definition B.7 is the classical e-S definition for continuity stating that/is 
continuous at x e M if given e > 0 there exists 5 > 0 such that d' (fix), f(y)) < e 
whenever d (x , y) < S. IfMand M' areR, then d'(f(x), f(y)) — \ f(x) — /(y)| and 
d(x, y) = \x - y|. 

Theorem B.3 Let (M,d) and (M', d' ) be metric spaces. A function f : M — * M' 
is continuous at x e M if the convergence x„ —>■ x in (M,d) implies the convergence 
/Ok) -* fix) in (M\ d') ([13], Sect. 2.3, [15], Sect. II.4). 

TheoremB.4 Let(M,d)and ( M 1 , d') be metric spaces. Suppose a function f \ M -> 
M' has the property d' if (x), fiy)) <cd(x,y), x, y e M, for a fixed number c > 0. 
Then f is continuous. 

Proof The convergence x n -> x in (M, d) means dix n ,x) -* 0 so that d'(f(x n ), 
fix)) < cd(x n ,x) — > 0 as n — >■ oo, implying that/is continuous at x e M by 
Theorem B.3. A 

We conclude this section with two definitions involving metric spaces and 
mappings between these spaces. 

Definition B.8 Let (M,d) and (M\ d') be metric spaces. A mapping / : (M, d) -» 
(M' . d') is a homeomorphism if it is invertible and both/and f~ 1 are continuous 
functions. If such a mapping exists, ( M,d ) and (M' . d') are said to be homeomorphic 
spaces. 

Definition B.9 Let ( M,d) and ( M' . d') be metric spaces. A mapping / : (M, d) — > 
iM',d') is an isometry if d(x, y) = d'(fix), fiy)) for all x,yeM. If such a 
mapping exists, iM,d) and ( M' , d') are said to be isometric spaces. 

Isometric spaces are equivalent in the sense that, for example, the image {/On)} 
of a sequence \x n } in M converging to x e M is convergent in M' since d' (/On), 
fix)) =dix„, x) — > 0 as n — > oo. 
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B.1.2 Closed, Complete, and Compact Sets 

A closed set in a topological space is the complement of an open set. This definition 
was used for metric spaces with topologies induced by their metrics. We give an 
alternative definition for closed sets, define complete and compact sets, and examine 
the relationship between closed, complete, and compact sets. 

Definition B.10 Let ( M,d ) be a metric space, A C M, and \a n ) C 4' : a sequence 
converging to a. We say that A c is closed if a e A c , that is, A c contains all its 
accumulation points. A point a is an accumulation point for a set S if, for every 
e > 0, the ball B(a, e) contains a point x e S distinct from a. 

Theorem B.5 Let A C M be a subset of a metric space (M,cl). Then A is open in 
the topology induced by metric d, that is, A c is closed in this topology, if and only if 
A c is closed according to Definition B.10, that is, the two definitions of closed sets 
are equivalent. 

Proof First we show that if A is open in the topology induced by d, that is, A c is 
closed in this topology, then A c is closed by Definition B.10. Consider a sequence 
{«„} C A c converging to a, and assume a e A. Then, there exists r > 0 such that 
B(a,r) C A so that {a,,} C A c C B(a,r) c . Since d(a n ,a ) > r for all n, the 
sequence {a,,} cannot converge to a e A, so that we must have a e A c implying that 
A c is closed according to Definition B.10. 

Second we show that if A c is closed by Definition B.10, then A is open in the 
topology induced by metric d. Suppose A is not open in this topology. Then, there 
existso e A such that B(a. 1 / n) (j_ A, n = 1, 2, . . ., implying that there is a sequence 
a n e A c with dia, a n ) < 1 / n that converges to a. We have constructed a convergent 
sequence {a n } in A c so that its limit a must be in A c since A c is assumed to be closed 
by Definition B. 10. This is in contradiction with the assumption a e A, so that A must 
be open in the topology induced by metric d. ▲ 

Example B.8 Let a e M be an arbitrary point of a metric space M. The set {«} is 
closed since the only sequence contained in this set is {a, a, . . .} and this sequence 
converges to a. O 

Example B.9 The quadrant {(x, y) e R 2 : x 2 + y 2 < 1, x > 0, y > 0} is a closed 
set since, for example, any converging sequence (x n , y n ) in this set satisfies the 
conditions Xn + y„ < !’ Xn > 0, and y n > 0 so that its limit is also in this set. We 
also note that the intersection of closed sets { AF } is itself closed since A = U, A, 
is open so that A c — fl; ,4' is closed. This observation can be used to show that 
{(ij)eK 2 : x 2 + y 2 < L x > 0, y > 0} is closed as an intersection of the 
closed sets {(x, y) e R 2 : x 2 + v 2 < 1}, {(x, y) e R 2 : x > 0}, and {(x, y) e R 2 : 
y > 0}.O 

Definition B.ll A set A C M of a metric space (M,d) is complete if whenever a 
sequence {a n } C A has the property d(a m , a n ) — »• 0 as m,n -* oo, it is convergent 
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with limit in A. Sequences {a n } with the property d(a m , a n ) -* 0 as m, n — »■ oo are 
called Cauchy sequences. 

Theorem B.6 Let A be a complete set of a metric space (M,d). Then A is also closed. 

Proof Suppose A is not closed, so that we can find a sequence {a n } C A converging 
too f. A. Since the sequence is convergent, d(a m , a, ,) < d(a m , a) + d(a, a n ) — > Oas 
m, n — > oo so that {a, ,} is Cauchy. Since A is complete, a must be in A in contradiction 
with our assumption. ▲ 

Arguments similar to those used to prove Theorem B.6 can be used to show that 
a closed subset A of a complete set B is also complete, where A and B are subsets 
of a metric space (M, d). This implies that any closed subset of a complete metric 
space is complete. 

Theorem B.7 The spaces R rf , d > 1, C, and C[ 0, 1] are complete metric spaces 
([13], Sect. 3.4). 

Proof We give a partial proof of the fact that C[0, 1] with the sup-metric is complete. 
Let [x„] C C[0, 1] be a Cauchy sequence, that is, d(x m , x n ) — > 0 as m, n —> oo. 
Since 0 < | x m (t) — x n (t) \ < d(x m , x n ) for each t e [0, 1], the numerical sequence 
[jc, 2 (?)} is Cauchy and converges to a limit x(t) e R for each t e [0, 1] since R is 
complete. It remains to show that the function t i-> x(t) is continuous, that is, 
x e C[0, 1], and that d(x n , x) — >■ 0 as n — > oo. This part of the proof can be found 
in [13] (Theorem 3.9). A 

Definition B.12 A subset A of a metric space (M.d) is said to be compact if every 
sequence in A has a subsequence convergent to a point in A. 

Theorem B.8 A compact set A in a metric set is also complete. 

Proof Let A be a compact set and let {a n } be a Cauchy sequence in A, that is, 
d(a m ,a n ) — ► 0 as m,n — > oo. Since A is compact, there exists a convergent 
subsequence {«£„} of {a n } with limit a e A. We have 0 < d(a n , a) < d(a n , cik n ) + 
d(ak n , a), d(a n , £U-„ ) — > 0 as n — > oo since {a n } is Cauchy, and d{ak n , a) — »■ 0 as 
n —> oo since {ak n } is convergent. This shows that an arbitrary Cauchy sequence in 
A is convergent with limit in A, so that A is complete. A 

Example B.10 The converse of Theorem B.8 is not true. For example, the real line 
R is complete but is not compact since, for example, the sequence [1,2 
has no convergent subsequence. O 

The statements of Theorems B.6 and B.8 can be summarized by the implications 
compact => complete=^ closed. 

Theorem B.9 A set A is compact if and only if it is closed and bounded, that is, 
there exists a number do such that d (a' , a") < do for all a ' , a" G A. 

Proof We have seen that compact sets in metric spaces are also closed. Suppose a 
compact set A is not bounded. Then we could construct a sequence {a, a i , r/ 2 , . . .} in 
A such that d(a, cq) > 1, dia. 02 ) > 2 and so on. Since the sequence cannot have a 
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convergent subsequence, we have a contradiction implying that A must be bounded. 
It remains to show that a closed and bounded set A is compact (see, for example, 
[13], Theorem 3. 12 for M = R 2 ). A 

Example B.ll A bounded interval in M is closed and bounded, so that it is a compact 
set. However, M is not compact. O 

Theorem B.10 Let ( M,d) and {M' , d' ) be metric spaces and f : M —> M' be a 
function. Then (1 )fis continuous if and only if / _1 (A) is closed in ( M,d) whenever 
A is closed in (M ' , d'), (2) if Mis compact and f is continuous, thenflM ) is compact, 
and (3) if M is compact and fis continuous, then f is uniformly continuous. 

Proof For (1), suppose first that/is continuous and A is closed in {M' , d'). Consider 
a convergent sequence x n — > x in ( M,d) such that [x )? ] C / _1 (A). This sequence is 
mapped into { / (x n ) } C A that converges to / (x ) by continuity. Since A is closed, 
we have f(x) e A so that x e / _1 (A). Conversely, assume that f~ l {A) is closed 
whenever A is closed. Consider a convergent sequence x n —> x in (M,d) and assume 
that its image {/(x„)} is not convergent, so that some subsequences {/ (xk n )} of 
{/(x„)} do not approach /(x). Note that A — {x'eM' : d{f{x kn ), x') > e}, e > 0, 
is closed, {f{x kn )} C A, and / _1 (A) is closed by assumption. Consider a conver- 
gent sequence [x,,] C / _1 (A) with limit x e / -1 (A), so that /(x) e A implying 
d'ifix), /(x)) > e, a contradiction. Hence, {fix,,)} must be convergent so that/is 
continuous ([13], Theorem 5.2). 

For (2), take a sequence {/(x„)} C M' that generates the sequence [x,,] C M, 
which converges toxeM since M is compact. By continuity, we have fix,,) -» 
fix), so that f(M) is compact ([13], Theorem 5.5). 

For (3), note that / is uniformly continuous if given e > 0 there exists 5 > 0 
such that d'ifix), fiy)) < e whenever d{x, y) < S for all x, y e M. Hence, / 
is not uniformly continuous if there exits an e > 0 such that there is no S > 0 with 
the property that d{x, y) < 8 implies d'ifix), fiy)) < s for all x, y e M. Fix 
e > 0, set S — l/n, n = 1,2,..., and let x„, y n eM such that dix n , y n ) = l/n and 
d' if{x n ), f {yn)) > £ • Since M is compact, the sequences [x„ } and { y n } have conver- 
gent subsequences Xk n — ► x and yk n —*■ y. Since d{xk n , yk n ) —*■ 0, we have x = y so 
that f{x kn ) -> f{x) and f{y kn ) f(y) = fix) implying d'{f{x kn ), f{y kn )) -* 0 
in contradiction with d'{f{x n ), fiy,,)) > s. Hence, /is uniformly continuous ([13], 
Theorem 5.6). A 


B.1.3 Sequences 


Definition B.13 An infinite list of objects [xi , xi , . . .} in a set is called a sequence. 
A sequence [xi, xi , . . .} in a metric space ( M,d ) is said to be convergent and converge 
to a limit x e M if d{x „ , x) — ► 0 as n — »■ oo. 
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Example B. 12 LetM = R 9 with the Euclidean metric, and let x n = (x n p, . . . , x nq ), 
n — 1,2,..., be a sequence in M. Then x n —*■ H = (|i, . . . , Hq) eR 9 if and only 
if x f i i — > f for i = 1, . . . , q, that is, coordinate convergence. This holds since 
\Xn,i - Hi I < d(x (n) ,H ), 1 = 1,...,$, so that the convergence x ( ' n ' 1 l- implies 
the convergence by coordinates. Conversely, coordinate convergence, that is, \x n j — 
t;i | — ► 0 implies (x n j — f,) 2 — > 0 for all i = 1, . . . , q so that d(x n , £) — >- 0 as 
n — > oo. O 

Example B.13 A sequence may or may not be convergent depending on the 
metric used to assess this property. For example, consider the sequence x ln) = (2 + 
1/2", 1 /2") e R 2 , n = 1,2,.. ., that converges to (2,0) with respect to the metrics 
in (B.3). However, the sequence diverges under the metric in (B.2) since d(x (n \ 
(2, 0)) = 1 for all n > 1 ([13], Sect. 2.3). O 

Example B.14 Let M — C[ 1,2] be a metric space with the sup-metric, and let 
x„(t ) = nt/(n + t) be a sequence in this space. The distance between x„ andx(f) = t, 
a member of C [ 1 , 2] , is d(x„, x) — t 2 /(n + t) < t 2 /n < A/n — > 0 so that 
x n -* x. However, coordinate convergence, that is, the convergence x n (t) — > x(t) 
for each valued of t e [1, 2], does not necessarily imply x n — > x. For example, the 
sequence x n (t) = t " , n= 1,2,..., in C[0, 1] converges coordinate-by-coordinate to 
x(t) — S(t — 1 ). However, x n does not converges to x in C[0, 1] since .x ^ C[0, 1].0 

Theorem B.ll Let {x n } be a convergent sequence in a metric space (M,d). Then (1) 
x„ has a unique limit, (2) any subsequence ofx n is convergent and converges to the 
same limit, and (3) x n is a Cauchy sequence, that is, d(x m , x „ ) — »■ 0 as m, n — > oo. 

Proof Suppose x„ has two limits, x and y. Since d(x, y) < d(x, x n ) + d(x„, y) 
and d(x, x n ), d(y, x„) — > 0 as n — > oo, we have d(x, y) = 0, that is, x = y. Let 
Xki ,Xk 2 , , k\ < ki < ■ ■ ■ , be a subsequence of x n . We have d(xk n , x) —*■ 0 as 
n -* oo since d(x n , x) — 0 by assumption and {xk n } C {x n }. That x n is Cauchy 
follows from the triangle inequality. ▲ 


B.1.4 Contraction 

Definition B.14 Let ( M,d ) be a metric space. A mapping T : M — > M is said to be 
a contraction if there exists a real number X e [0, 1), called contraction factor, such 
that d(Tx, Ty) < Xd(x, y) for all x, y e M, where Tx is a short hand notation for 
T(x). 

Definition B.15 A fixed point for T : M — > M is a point x e M such that Tx=x. 

Theorem B.12 Contractions are continuous functions and d{T n x, T n y) < 
X n d{x, y), where T n is an abbreviation for T o T n ~ l , n > 2. 

Proof Let x„ -> x be a convergent sequence, so that d(x n , x) -> 0 as n oo. 
Hence, d(Tx n , Tx) < Xd(x n , x) < d(x„, x) also converges to 0 as n — » oo, so that 


Appendix B 


483 


T is continuous. The inequality d{T n x, T n y ) < )"d(x, y) results from the defining 
property of contraction. A 

Theorem B.13 (The Banach fixed point theorem) Any contraction T : M —*■ M in 
a complete metric space (M,d) has a unique fixed point xq e M . If x o is a fixed point, 
then T n x 0 — > xq as n — »■ oo ([14], Theorem 1.3.15). 


B.2 Linear or Vector Spaces 


Definition B.16 A set V is said to be a linear or vector space if it is closed under 
finite vector addition and scalar multiplication, that is, for any x , y, z e V and scalars 
a, (5 e F, we have 

x + y = y + x (commutativity) 

(x + y) + z = x + (y + z) (addition associativity) 

0 + x = V + 0 = x (addition identity) 

x + (— x) = 0 (existence of additive inverse) 

a(/3x) = (a/3)x (associativity of scalar multiplication) 

( a + fi)x — ax + fx (distributivity of scalar sums) 

a{x + y) = ax + ay (distributivity of vector sum) 

lx = x (scalar multiplication identity). (B.8) 

It is sufficient for our discussion to assume that the field F coincides with K or C. The 
linear spaces corresponding to R and C are called real and complex, respectively. 

Definition B.17 A subset Vo of V is called a linear subspace of V if x + y e Vo and 
ax e Vo for x, y e Vo and arbitrary scalar a. Note that a linear subspace is a linear 
space and that the intersection of linear subspaces including an arbitrary set in V is 
a linear subspace. 

Example B.15 The collection of vectors in R rf , d > 1, with the usual addition and 
scalar multiplication is a linear space. The vectors contained in an arbitrary plane 
define a linear subspace of R 3 . O 

Example B.16 The set C[0, 1] of real-valued continuous function with pointwise 
addition and multiplication, that is, ( x + y)(t) — x(t ) + y(t) and ( ax)(t ) =ax(t), 
x, y e C[0, 1], t e [0, 1], is a linear space. These operations are similar to those in 
if we view x(t), t e [0, 1], as the coordinates of the infinitely dimensional vector 
defining x. Note also that the set of real-valued continuous functions x defined on 
[0, 1] with the property .r (0) = .r(l) = 0 defines a linear subset of C[0, 1]. Additional 
examples can be found in [16] (Sect. 4.2). O 

Definition B.18 A finite subset B = {x i, ... ,x n ) of V is linearly independent if 
X/ ! = l a i x i — 0, oii e F, implies a\= ■ ■ ■ — a n — 0. If every x e V has the unique 
representation x = X/ ! = l a i x i > a i e ^ we sa V that B is a basis for V and that V is a 
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finite dimensional space with dimension n. The scalars {a, } are called the coordinates 
of x in basis B. 

Example B.17 The /z-dimensional vectors ( 1 . 0. 0, . . .), (0. 1, 0, . define a 
basis for the Euclidean space R" . O 

Theorem B.14 A finite dimensional vector space V may have many bases but they 
have the same number of members, and this number defines the dimension ofV. 

Proof Let B = (x i, . . . , x n ) and B' ={y\, , y m ) be bases in V. Suppose m > n and 
(yi, . . . , y n ) are linearly independent. Since yj e V, we have yj = a j,i x ii 

j = 1, ... ,m. Let b, c e F be such that byk + cyi = j (bakj + caij)x ; = 0. 
Since (x\, . . . , x n ) are linearly independent, we have bak.i + caiy =0 or aij = 
— ( b /c)a/c,i , i — I . ... n, so that the vectors _>’/ , k < n < l. are linearly dependent on 
(yi, . . . , y n ) , which implies that B' has only n linearly independent vectors. Similar 
arguments show that B' cannot have fewer than n linearly independent vectors. ▲ 

Example B.18 Let B — (xi , . . . , x n ) be a basis for an n-dimensional vector space V. 
Let x' i = y" _ l CijXj , i = 1, . . . , n, and denote by c = {c,q } the matrix defining the 
mapping (xi , . . . , x n ) i->- (x[ , . . . , x' n ). If c is not singular, then B* = (x[, . . . , x' n ) is 
a basis for V. This follows from the observation that B' is a basis in V if its members 
are linearly independent, that is, j a;x- = 0 holds only if a, =0, i — 1, . . . , n, or 
Z" = t ( Z" = l c ij «(' ) x j = 0 implying Z”= 1 c ‘j = 0 , j = l,...,n, since 
(xi,...,x„) are linearly independent. This system of equation has the solution 
aq = • • • = a n = 0 if the determinant of c is not 0. <> 

Theorem B.15 The linear mapping f : V — > F" defined by f(v) = (o/\ . . . . , a n ) is 
an isomorphism ofV onto F n , where (aq , . . . , a„) are the coordinates of x e V in a 
basis B — (x i, . . . , x n ). 

Proof The function / : V — > F" is linear with inverse f~ [ : F" — > V defined 
by / _1 (oq, . . . , a n ) = ZZ t a ‘ x i e ^ ■ Since/is a bijection or a one-to-one corre- 
spondence, that is, whenever f(x) = f(y) then x = v, and an onto function, that is, 
all members of F" are used, /is an isomorphism. A 


B.3 Normed Linear Spaces 

Definition B.19 A function || • || : V —> R defined on a linear space Lis a norm if 

|| jc || > 0, x ^ 0, ||x|| = 0 x = 0 (positive definite) 

||otx|| = |a| || jc || (linearity) 

||x + y|| < ||x || + ||y|| (triangle inequality) (B.9) 

for all x, y e V and any scalar a. The definition requires the linear space structure 
so that, for example, x + y,ax eV for x, y e V and scalar ot. The pair (V, || • ||) is 
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called a normed linear space or normed vector space. If jc i — >• ||x | does not satisfy 
the condition ||x|| = 0 x — 0, || • || is said to be a seminorm. 

Example B.19 The norm of a vector can be interpreted as its length. For example, 
the length of the vectors x = (xi , X 2 , xj) e R 3 , referred to as the Euclidean norm, is 
|| a || = (x 2 + x\ + x 2 ) 1/2 . ❖ 

Theorem B.16 All norms on a finite dimensional vector space V are equivalent in 
the sense that a sequence converges in a norm if and only if converges in any other 
norm. 

Proof Let (e \, . . . , e n ) be a basis in V , so that x e V admits the representation 
x — l a ‘ e i ’ a ‘ e C. For an arbitrary norm ||-|| on V, we have ||x|| = || j cqe, || 

< (2"_i || ei -|| 2 ) 1/ -(Z"=t |a,'| 2 ) 1/ “ = c||x||i by the Cauchy-Schwarz inequality 
with the notation c=(]T" = i ||C; H 2 ) 1 ^ 2 > 0 and ||x||i=(^"_t |q!, | 2 ) 1 ^“ denotes 
another norm. Let h(f) = || X "- 1 Pi^i II be a function defined on S = {/3 — , . . . , 

f n ) eC" : ||/1|| = 1}. Since h is continuous and S is compact, there exists § e S 
such that h{f)= min^gs hf 8). Moreover, h{f) > 0 since (ei, . . . , e n ) are linearly 
independent vectors so that || x || = || a || /? (a/ 1| a ||) > h(!- )||x|| i. We have /!(§)||x|| i < 
||x|| < c||x||i so that convergence in norm || ■ ||i implies convergence in norm || • ||, 
which proves the theorem since || ■ || and || ■ || i are arbitrary. ▲ 

Definition B.20 A Banach space is a normed linear space ( V , || • ||) that is complete. 
Note that any normed linear space with finite dimension is a Banach space ([16], 
Theorem 5.10.2). 

Example B.20 Let l p , 1 < p < oo, be the set of all sequences x = (xi , X 2 , . . .) of 
scalars such that l \ x k\ p < The set l p with operations similar to those in 
is a linear space, and with the p-norm 



is a Banach space. Similarly, the space l°° of all bounded sequences x = (x \ , X 2 , . . .) 
with operation as in l p is a linear space that with the norm 

l|x||oc= sup{|x*:| : 1 < k < oo] (B.l 1) 

is a Banach space. The norms in (B.10) and (B.l 1) defined for infinite-dimensional 
vector spaces can be also used for vector spaces with finite dimension. O 

Example B. 21 Let L p \(), 1], 1 < p < oo, consists of real- valued functions defined 
on [0,1] such that /J \x(t)\ p dt < oo. Then L p [ 0, 1] with pointwise addition and 
scalar multiplication is a linear space and, with the norm 
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is a Banach space. It is possible to have ||x — y|| =0 although x and y differ on 
a set of zero Lebesgue measure. To remove this ambiguity, we will not distin- 
guish between members of L p [ 0, 1] that are equal almost everywhere (a.e.), that 
is, members x, y e L p [ 0, 1] such that || jc — y|| = 0. A broad range of examples of 
normed linear spaces can be found in [16] (Sect. 5.3) O 

Example B.22 The space of real- valued continuous functions C[0, 1] with the 
norm ||x|| = HxHoo = sup[|x(f)| : t e [0, 1]} is a Banach space, that is, any Cauchy 
sequence {x n } C C[0, 1] is convergent and converges to a member of C[0, 1] ([13], 

r \ 

Theorem 3.9). On the other hand, C[0, 1] with the norm ||x|| = L |x(f)| dt is not 
complete. For example, the sequence of continuous functions x„ (t) = I (0 < t < 
1/2) + 1(1/2 < t < 1/2 + l/n)(l - n(t - 1/2))), n= 1,2,.. ., is Cauchy since 
|| — x„|| < \\p +ln x n (t) dt ~ 0(n~ x ) but converges to x(t) — 1(0 < t < 

1/2), that is not in C[0, 1], O 

In linear spaces, only finite sums of vectors are defined. This restriction can be 
removed when dealing with normed linear spaces (V, || ■ ||), so that infinite sum 
l x i > x i e have meaning. 

Definition B.21 Let (V. || • ||) be a normed linear space, X/*Li x a x i e h, an 
infinite sum, and s n = X/ = l x i > n=l,2, . . ., partial sums of i x,-. If {.?„} is a 
convergent sequence, that is, there exists s eV such that ||s„ — s || — >■ 0 as n — > oo, 
then the notation l x/ has meaning. We say that the sum t x ‘ ' s convergent 
and equal s = limH-^oo s n . 

Theorem B. 17 If{s n ) is a convergent sequence, then ||x„ || — > 0 as n — > oo. 

Proof Since [s n ] is convergent, we have ||.y,„ — s n || < ||i m — s || + ||s — s n || — > 0 as 
m, n —*■ oo, which implies ||x„|| — > 0, n — > oo, by taking m = n + 1. ▲ 

Definition B.22 Let V' and V" be linear subspaces of a normed linear space V whose 
only common point is 0 e V. If for every x e V there exists x' e V' and x" e V" such 
that x = x' + x", we say that V is the direct sum of V' and V" and use the notation 
V= V'© V". 

Theorem B.18 The decomposition x — x' + x" is unique. 

Proof If x has another representation x = y' + y" . y' e V' and y" e V", we have 
x' — y' — x" — y" with x' — y' e V' and x" — y" e V" . Since 0 is the only common 
element of V' and V" , it follows x' = y' and x" — y" . ▲ 

Definition B.23 Suppose V =V' ® V" . The projection p : V — ► V' is the function 
p(x) = x' , x e V. The projection is a linear and continuous function. 

Example B.23 Let (xi, . . . , x n ) be a basis for an ^-dimensional vector space V. 
The sets V) = {a/x, : a, = scalars], i = 1, . . . , n, are linear subspaces of V whose 
only common point is 0e V, so that V = ©” =1 V, . The functions p, : V — > V) 
defined by pi (x) = x; , x e V, i = 1, . . . , n, are projections of V on its linear 
subspaces Vj . O 
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B.3.1 Completion of Metric Spaces 

Convergent sequences in metric spaces are Cauchy sequences. However, the converse 
may or may not be true, as illustrated by Example B.22. If a Cauchy sequence in a 
metric space (M, tl) is not convergent, it can be made convergent by adding points to 
M such that it becomes a complete space. Note that a normed linear space (V, || • ||) 
is a metric space ( V,d ) with the metric d(x, y ) = \\x — v||, x, y e V . 

Following are two essential results on the completion of metric spaces that are 
given without proof, the completion theorem and the Weierstrass approximation 
theorem. 

Theorem B.19 (The completion theorem) For any metric space (M,d) there exists 
a complete metric space (M' , d') and a dense subset M of M' such that (M,d) and 
(M, d') are isometric. All completions of (M,d) are isometric, that is, they present 
the distance ([14], Theorem 2.1.6). 

Theorem B.20 (The Weierstrass approximation theorem) The set S? of polynomials 
on a closed and bounded interval [a, b] is dense in (C[a, b) ’ II ' lloo)- Hence, any 
real-valued continuous function defined on [a,b] can be approximated to any degree 
of accuracy by a polynomial. Also, the normed linear space ( C[a , b] ’ II ' lloo) is a 
completion of the normed linear space (E? , || • ||oo) ([14], Theorems 2.2.1 and 2.2.3). 

Example B.24 We have seen in Example B.22 that C[0, 1] with the norm ||x||i = 
||.v || = fJ |jt(r)| dt is not complete. If we enlarge C[0, 1] to the space L*[ 0, 1] of 
real-valued Lebesgue integrable functions defined on [0,1], then (L 1 [0, 1], || • || i) is 
a Banach space. We will discuss in some details L p - spaces, p > 1, in Sect. B.5. O 


B.3.2 Basis and Separability 

Definition B.24 Let ( V , || ■ ||) be a normed linear space. A finite set {x \, . . . , x„ ] 
of elements in V is a basis for V if every x e V admits a unique representation 
of the form x — X" = j a/x,-. If {xi , . . . , x,,} is a basis for V, we say that V has 
dimension n. 

A countable set {x\, . . . , x n , . . .} of elements in Lis a Schauder basis for Vif every 
x e V admits a unique representation of the form x = \ ajX, . If {xi, . . . , x n , . . .} 

is a Schauder basis for V, then j a,x; = 0 implies a,- = 0, Vi. 

Definition B.25 A normed vector space V is said to be separable by a basis if it 
admits a finite or a Schauder basis. 

Note that a topological space is separable if it has a countable dense subset and 
that a normed linear space that is separable by a basis it is separable ([14], Theorem 
3.3.6). For example, if V is ^-dimensional, the set 1 a/Xj : 9I(oi ,-), 3(«,) e Q] 
is countable and dense in V. We also note that separability by a basis and separability 


Appendix B 


do not coincide for Banach spaces ([14], Sect. 3.3.2), but they do for Hilbert spaces, 
as shown in a subsequent section. 

Example B.25 A Schauder basis for the space l 2 of square summable sequences 
consists of the sequences (1, 0, 0, . . .), (0, 1, 0, . . .), . . . in R. <> 


B.3.3 Operators 

Let (V, || • || y) and ( W , || • ||w) be normed linear spaces, and let T : V —> W be a 
function from V into W. referred to as an operator from V into W. If W is R or C, 
then T is said to be a functional. 

Definition B.26 Lis a linear operator if T {ax + /3y ) = aT(x) + /3T(y), x , y £ V , 
where a, /3 e F are scalars. A linear operator is called bounded if there exists M > 0 
such that || T{x)\\w < M||x||y for all x e V. The norm of T is || T|| = inf{M > 0 : 
\\T(x)\\w < M\\x\\ v ,xeV}. 

Example B.26 Let V = R 9 , = , and T — {tij} a {q' , < 7 )-matrix with real- 
valued entries. Then T is a linear operator, and || 7'jc || 2 = (2/ = t t ij x j) 2 — 

2f=i (2/ = i rfj) 1 ^ {iL'j = t*/) 1 ^ 2 — q'<li\\x\\ by the Cauchy-Schwarz inequality, 
where 7= maxi <i< q \i<j< q \tij\. O 

Example B.27 Let V — W — C ! [0, 1] be the space of real-valued functions that are 
continuous and continuously differentiable with the norm ||x|| 2 = J ( ! x(t) 2 dt and 
let T be the differentiation operator. The operator T is unbounded since, for example, 
x{t) = sin {tint), t e [0, 1], is a member of C*[0, 1], has norm ||x|| 2 = 1/2, and its 
image Tx has norm || Tx || 2 = f! n 2 it 2 cos 2 (tint) dt = n 2 rt 2 /2 so that || rx||/||x|| = 
(. rnt ) 1//2 . O 

Definition B.27 Let B{V, W) be the set of hounded linear operators T : V W.lf 
Wis K or C, then B(V,W) is the set of real/complex-valued bounded linear functionals, 
and is called the dual space of V. 

Theorem B.21 The set B(V,W) with the pointwise operations 

{Tx + T 2 ){x) = T X (x) + T 2 {x), x e V, 

{aT)(x) = aT(x), xeV, (B.13) 

is a linear space, where T, T\, T 2 e B (V. W) and a is a scalar. 

Proof Since B{V, W) is closed under these operations, it is a linear space. ▲ 
Theorem B.22 If 7 is continuous at a point xq e V , it is continuous in V. 

Proof Forx ^ xo andz = y+x— xo, x, y e y,wehaver(z) = T(y)+T(x) — T{xo) 
so that ||r(z)-r(x)|| = ||r(y)-r(x 0 )|| — >■ 0 if || v — .roll — ^ 0 since Lis continuous 
at xq. Hence, T is continuous at any x e V . A 
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Theorem B.23 A linear operator 7 is continuous if and only if it is bounded. 

Proof Suppose T is continuous but not bounded. Then, for each n > 1, there is 
x„ e V such that ||T(x n )||jy >n||x„||y implying ||7(z„)||iy > 1, where z„=x„/ 
(n||x„||y). Since lim,,-^ z n — 0, T is not continuous at the origin in contradic- 
tion with our assumption. If T is bounded, we have || 7(x) — 7’ ( y ) | m/ < M\\x — 
y || y, x, y e V, implying that T is continuous. ▲ 

Theorem B.24 If T is a bounded linear operator, then T i->- || T” || is a norm on 
B(V,W), where ||7|| = sup^^^! ||r(x)||iy= sup Wy = 1 \\T(x)\\ w . 

Proof That || 7|| is a norm on B(V, W) follows from ( i ) || 7|| > 0 by definition, and 
|| T || = 0 if and only if ||7(x)|| w = 0 for all x e V with ||x|| y = 1 implying T (x) — 0 
for || jc || y = 1, or T = 0, (ii) ||(a!T)(x)|| = |a|||7’(x)|| by properties of T and of 
the norm in W, and (Hi) ||(7i + T 2 )(x)\\ w = || 7\ (x) + T 2 (x)\\ w < l|7i(x)||vy + 
|| 72 (x) || n,' < || 7i|| + 1 1 72 1 1 by properties of the norm in W and the definition of the 
norm on B(V, W ). ▲ 

Note that, forx ^ 0, we have ||7|| > ||7(x/||x||y)||iy = ||7(x)||iy/||x||y, which 
implies ||7(x)||w < || 7|| ||x||y . The latter inequality is satisfied forx = 0, so that we 
have || 7 (x)|| w < || 7|| ||x|| y for all x e V ([15], Theorem II, p. 1 15). 

Theorem B.25 If W is a Banach space, then B(V,W) with this norm is a Banach 
space. 

Proof We need to show that limits of Cauchy sequence { T n | in B(V, W) are in this 
space. The idea of the proof is to construct the limit of { 7„ } pointwise by noting that the 
sequence {7„(x)J c W is Cauchy for any x e V so that it has a limit 7 (x) e W since 
W is Banach by assumption. To complete the proof, we need to show 7 e BiV . IT). 
This part of the proof can be found in, for example, [15], (Theorem III, p. 1 15). ▲ 

Example B.28 Let B(V,W ) be the set of (if , t/ )- matrices with real- or complex-valued 
entries, so that 7 = i = \ ..... if , j — l, .... q, are operators in B(V, W) with 
the usual matrix operations, V = or C ? , and W — or C ? . Norms on the vector 
spaces V and W can be used to define norms for 7, referred to as induced norms, and 
given by 

|| 7 1| = max{||7x||w : x e V, ||x||y < 1} = {||7x||iy : x e V, ||x||y = 1} 

= {||7x||iy/||x||y : x e V, ||x||y ^ 0}. (B.14) 

This definition of ||7|| can be based on the p-norm for vectors (Example B.20). In 
the special case of Euclidean norm ( p — 2), we have || 7 1| = (7 max (7 S )) ^ , where 
A lnax (7 S) denotes the largest eigenvalue of 75 and S denotes the conjugate transpose 
of 7. It is possible to define alternative matrix norms, for example, the maximum 
column or row sum of 7, the norm || 7|| = (| tjj | /:l ) 1 , and many other norms ([17], 

Chap. 10). ❖ 

Example B.29 Let 7 : L 2 [a, b] — > if [a. b] be an (integral) operator defined by 
Tx(t)— fb k(t, .v)x(.v) ds, where the kernel A: is a real-valued Lebesgue measurable 
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function on [a , b] x [a, b] that is square integrable, that is, b ^ 2 \k(t, s)\ 2 dt ds < oo. 
The operator is linear by properties of the Lebesgue integral and \Tx(t)\ < f b \ k(t, s ) 
x(,?)| ds < ( j b k(t, s) 2 dsj l '~ (f h x(s) 2 ds^ 1/2 by properties of integrals and the 
Cauchy-Schwarz inequality. We have ||7\r|| 2 < ||x|| b j 2 \k(t, s)\ 2 dt ds by the 
definition of || Tx\\, so that || T|| < \k(t, s)\ 2 dt ds by Theorem B.24. O 

Definition B.28 Let V and W be normed linear spaces and let 7’ : Vo — »■ W be a 
linear operator, where Vo is a subspace of V. We say that T is closed if, for every 
{jc , 2 } c Vo converging tore V, the equality lim^oo T (x n ) — y implies x e Vo and 
T(x) = y. 


BA Hilbert Spaces 


Definition B.29 Let Vbe a linear space. A function (-, •} : V x V -> K, C is called 
inner product if 

(x,x}>0, (x, x) = 0 x = 0 (positive definite) 

(x,y) = (y,x)* (conjugate symmetry) 

(ax, y) —a(x, y), (x + y, z) = (x, z) + (y, z) (linearity) 

(B.15) 

for x , y , z e V and aeRorC, where linearity in (B . 1 5) refers to the first argument of 
the inner product. Two members x and y of V are said to be orthogonal if (x, y) = 0, 
a property indicated by the notation x _L y . A point x e V is said to be orthogonal to 
a subset £ of V if (x, y) — 0 for all y e E, and we write x _L E. 

Theorem B.26 The function x i— >■ \\x\\ defined by 

|| jc || = x/ (x, x), x e V, (B.16) 

is a norm on V. The norm is said to be induced by the inner product on V. 

Proof The first two properties in (B.9) follow directly from the defining proper- 
ties of the inner product. The triangle inequality results from the Cauchy-Schwarz 
inequality, 


\(x, y}| < ||x||||y||, x, y e V, (B.17) 

which holds since ||Lx + y|| 2 = |L| 2 ||x|| 2 + X(x, y) + A*(_y, x) + ||y|| 2 > 0 for X e C 
arbitrary and in particular for X — — (y, x)/||x|| 2 . ▲ 

Theorem B.27 The norm in (B.16) satisfies the identities 

||x + y|| 2 + ||x — y|| 2 = 2||x|| 2 + 2||y|| 2 (parallelogram law) 

||x|| 2 + ||y|| 2 = ||x + y|| 2 , if {x, y) = 0 (Pythagoras ’theorem). (B.18) 


Appendix B 


491 


Also, f : V — * C defined by f (x ) = ( x , y) for an arbitrary but fixed y e V is a linear 
and continuous function. 

Proof These statements follow from the defining properties of the inner product and 
the Cauchy-Schwarz inequality. We also have | f (x) — / (V) | = | / (x — x') | = | (x — 
x ' , y}| < || jc — jc' || || y || by the definition of/and the Cauchy-Schwarz inequality, 
which shows that/is continuous. ▲ 

Definition B.30 A linear space V endowed with an inner product is called a Hilbert 
space if it is a Banach space with respect to the norm induced by the inner product, 
that is, the norm in (B.16). We use the notation H for Hilbert spaces. 

Example B.30 Let L 2 [a, b] be the vector space of all complex-valued Lebesgue 
measurable functions that are square integrable on [a,b\, endowed with the inner 
product (x, y) = f ( ’ x{t)y{t)* dt. The space L 2 [a, b] with the norm induced by this 
inner product is a Hilbert space (Theorem B.66). O 

Example B.31 Let D be a bounded subset of . The set L 2 (D) of real-valued 
Lebesgue measurable functions that are square integrable on D endowed with the 
inner product (x, y) = J D x(t)y(t) dt is a Hilbert space (Example B.30). Related 
Hilbert spaces used to obtain weak solutions for partial differential equations are 
H l (D) = {x : D — ► R. : x e L 2 (D), dx/dtj e L 2 (D), i — 1, . . . , d] endowed 
with the inner product ( x,y) H i (D) — J D (x(t)y(t) + V x(t) ■ Wy(t))dt, where 
V = (3/9fi, . . . , d/dtd), and Hq(D) = {x e H l (D) : x(f) = 0, tedD] equipped 
with the same inner product as where 3 D denotes the boundary of D. 

The norm induced on these spaces by the inner product is = f D (x{t) 2 + 

|Vx(t)| 2 ) dt.O 

Theorem B.28 Let E be a closed and convex subset of a Hilbert space H. Then E 
contains an element with the smallest norm. 

Proof Set 8 = inf {|| x || : x e E} and let {x„} be a sequence in E with the property 
lim„_ ) . 00 ||x„ || —8. Since Eis a convex set, (x m +x„)/2 e E and ||(x m +x ;l )/2|| > <5. 
The parallelogram law gives ||x m — x„|| 2 =2||x m || 2 + 2||x„|| 2 — ||x m + x„|| 2 < 
2||x m || 2 + 2 1 1 x,„ 1 1 2 — 4<5 2 so that ||x m — x„|| — >■ 0 as m,n —> oo showing that 
{x„ } is Cauchy so that lim, x n = x e H . Since E is closed, we have x e E. From 
lim^oo ||x„ || = 8 and the continuity of the norm, it follows that ||x|| = 8. A 


B.4.1 Basis and Fourier Representations 

We have seen that normed linear spaces can be decomposed in direct sums of linear 
subspaces. Similar decompositions are available for Hilbert spaces and can be used 
to construct Fourier representations for the members of these spaces. 

Definition B.31 Let Fbe a vector space endowed with an inner product. A collection 
{ei,e2, ■ • •} C V of orthogonal vectors with unit length, that is, (e;, ej) — S,j, is called 
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an orthonormal set. An orthonormal set is a basis in V if there is no other orthonormal 
set including it. The scalars c; = (x , e t ) , x e V, are called Fourier coefficients for 
x e V with respect to the orthonormal sequence (e, }. 

Theorem B.29 Let W be a subspace of a Hilbert space H. For any x ell we have 
x = x' + x", where x' e W and x" _L W. Moreover, the decomposition is unique. 

Proof If x e W, the decomposition is valid with x" = 0. If x f W, set 
E — {z : z = x — x' , x' eW}. Since IT is a subspace of H, E is convex and 
closed. Theorem B.28 states that there exists x" = x — x' e E, x' e W, with the 
smallest norm. Accordingly, \\x" — Xy || 2 > ||x"|| 2 for all y e W or, equiva- 
lently, —X{y,x") — X *(x",y) + |A| 2 ||y|| 2 > 0, which gives \{y,x")\ < 0 for 
X = (x”, y)/||y|| 2 implying (y, x") — 0, that is, x" 1 y. ▲ 

Theorem B.30 If { v, } are mutually orthogonal vectors in a Hilbert space H, then 


z 



(Parseval’s identity) 


(B.19) 


provided the numerical series on the left side of the equality is convergent. 

Proof Let vi, . . . , v„ be orthogonal vectors in H, that is, (v; , Vj) = 0 for i ^ j . We 
have 


Zip'll 2 



(Pythagoras ’ theorem ) . 


(B.20) 


Since //is a Hilbert space, the sequence of partial sums {.v„ = Xf- 1 v i } is Cauchy by 
the convergence of t II v ' II 2 , so that the sequence {5„} converges to 

Z OO . 

! =1 V ( ■ ^ 

Theorem B.31 Let {e n } be an orthonormal sequence in a Hilbert space H. The 
series y j a n e n is convergent in H if and only if y'ffL j \a n | 2 is convergent in K. 
In case of convergence, we have || \ a n e n || 2 = X,7L i |a„ | 2 . 

Proof If i <x n e n is a convergent series, then the sequence s n = X" = i a;e; of 
partial sums is Cauchy, that is, ||s„+^ — Sn|| 2 = S/^n+i \ a i\ 2 B as n —> oo 
for k > 0. Hence, |a,| 2 is Cauchy and, therefore, convergent, so that 

lim„^oo ||s„|| 2 = lim™Z;' =1 l«il 2 = YaL l l«/ 1 2 - A 

Example B.32 The functions { e n (t ) — exp(i2jrnt), n = 0, ±1, ±2, . . .}, t e [0, 1], 
define an orthonormal set in the Hilbert space H = Lr [0, 1] since their inner product 
is ( e m , e n ) = Jg 1 e m (t)e n (t)* dt = jJ exp(i2jt{m — n)t) dt = S mn . Moreover, it is a 
basis in L 2 [0, 1] ([16], Theorem 5.18.3). Accordingly, x e i 2 [0, 1] has the represen- 
tation x = Z/TL-oo a " e n » where a n = f Q l x(t)e„(t) dt. O 

Theorem B.32 Let { e n } be an orthonormal sequence in a Hilbert space H. For any 
x G H , we have 
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n 

x - ^(x, ei)ei 

i = l 


n 

||x|| 2 — |(x, e,-)| 2 ( Bessel's equation ) 

i = i 


(B.21) 


so f/iflt 

« OO 

^ |(x,e;}| 2 < ||x|| 2 and |(x,e/)| 2 < ||x|| 2 (Bessel's inequality), 

;=i ;=i 

(B.22) 

which shows that ’^°° = j (e; , x)e,- is o convergent sequence. 

Proof Since x — f, f = ^" =1 (x,e;)e;, is orthogonal on the subspace spanned 
by (ei,...,e„), we have ||x|| 2 = ||f || 2 + ||x — f|| 2 by Pythagoras’ theorem, or 
||x || 2 = j | (x, e;} | 2 + ||x — j (x, e,)e; || 2 , that is, the Bessel equation. The 

Bessel inequality follows from this equation since ||x — Xf- 1 (e; , x}e; || 2 > 0. ▲ 

The Bessel equation and the Pythagoras theorem imply that f = ]T" = l ( x > e i ) e; is the 
best approximation of x e H belonging to the subspace H n spanned by (ei, . . . , e n ) . 
For any vector z = X/*= l this subspace, we have ||x — z || 2 = ||x — f+f — 

^|| 2 = ||x — f || 2 + ||f — z|| 2 > ||x — f || 2 , that is, the error ||x — z|| for any z e //„ is 
larger than ||x — f || = ||x — ]T"_ l ( x > e < ) e i II • 

Theorem B.33 Let W be a subspace of a Hilbert space H that is dense in H, and let 
{e n } be an orthonormal basis for W that may or may not be finite. Then {e,,} is also 
an orthonormal basis for H. 

Proof If { e n } is finite, the closure of (V is // since finite dimensional spaces are 
complete. If {e n } is not finite, we have the representation x= 2h°=i( x ’ e n ) e n for 
all x e W. We need to show that any x e H has a similar representation. Since the 
closure of W is H, for given e > 0 and an arbitrary x e H there exists y e W such that 
||x - y II = ||x - XiT= l (>> e n) e n II <£■ Since the series X,T= l (?> e n)e n is convergent, 
there exists no such that ||y — y„|| < s, n > no, where y„= i (y, e;)e;. 
Since x n = ^” =1 {x, e/)e, is the best approximation of x in the subspace spanned 
by (ei, . . . , e n ), then ||x — x„|| < ||x — y„||. We have ||x — x„|| < ||x — y„|| < 
||x — V II + \\y — }’n II < 2e for e arbitrary so that x = lim,,^;*, ^”= j (x, e,-)e/ = 
\{x,ei)ei. A 

Theorem B.34 Every separable Hilbert space has an orthonormal basis. 

Proof If the space has finite dimension, the Gram-Schmidt algorithm can be used 
to construct such a basis. The construction is sequential. Let {f^j be a finite or 
countable set of linearly independent vectors in a Hilbert space. The first member of 
the orthogonal basis is e\ — fi/||fi ||. The second member of this basis is e2 = (£2 — 
aei)/||f2 — ore 1 1|. where a is the solution of (£2 — oie\ , e\) = 0. The third member of 
the basis is the coordinate of £3 orthogonal to the subspace spanned by (fj , ^2 ) , and 
so on. The proof of this property for Hilbert spaces that are not of finite dimension 
can be found in [14] (Theorem 3.4.9). A 
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Theorem B.35 Let {e, }; 6 / be an arbitrary orthonormal set in a Hilbert space H, 
where I is an index set. The Fourier coefficients of an element x e H with respect to 
{e,},- e / are zero, except for countable many coefficients. 

Proof Set c; = (e;,x), i e I, and note that {/ : a 0} = U p = {i : |c,-|>l/p}. 
Consider a finite subset {cq, . . . , c,„} of {; : c, f 0}. The Bessel inequality gives 
XSt=il c 'J 2 — ll x ll 2 so that, if \ci k \ 2 > 1/p, k= l, . . . ,n, then n < p||x|| 2 
implying that the set [i : |c,-| 2 >l/p} has less than p||x|| 2 members, so that 
{; : Ci ^ 0} is countable. This results gives meaning to the notation Tiei k/l > 
which is T ie i\ci\ 2 = T'kLi |cj t | , where {cq}, k= 1 , 2 ,..., are the non-zero 
Fourier coefficients. A 

Theorem B.36 Let {e,}, e / be an orthonormal set in a Hilbert space H. The 
Fourier coefficients of an element x e H with respect to {ej },- e / satisfy the inequality 

Tie I \ c >\ 2 - ll x ll 2 - 

Proof For an arbitrary n we have T'l = i IqJ 2 — l|x|| 2 by Bessel’s inequality. The 
above statement follows from this inequality by letting n —> oo. A 

Theorem B.37 Let E = {e,- },• e / be an orthonormal set in a Hilbert space H, x e H, 
andcj — (x, ef) be Fourier coefficients. The series T.; e j Qe,- converges to an element 
x' such that x — x' _L E. 

Proof Let s n = Tk = i c ‘k e ‘k d enote the sequence of partial sums of Tiei c ‘ e ‘ 
corresponding to non-zero Fourier coefficients. This sequence is Cauchy since 
Ikm - S«|| 2 = Tk=n + 1 I C/J 2 , rn>n, and the series TT= 1 l c hl 2 is convergent. 
Since H is a Hilbert space, {.v,,} is also convergent. Let x' — TT= i c ik e h h e the 
limit of {.v„ }■ We have (s n , e, p ) — Ci p provided p < n. The limit as n — > oo gives 
(x 1 , ei p ) — Cj p — ( x , ei p ) so that (x' — x, e, p ) = 0 for all p > 1 implying that x' — x 
is orthogonal to the subset of E spanned by those {e , } with c ; - f 0. For cy e E with 
d = 0, we have (x ' , ef) — 0 = (x, e;) or (x' — x, e,} = 0 implying x' — x _L E. A 

Theorem B.38 An orthonormal set E — {e, },• e / in a Hilbert space H is a basis for 
H if and only if x _L E implies x — 0. 

Proof Suppose E is a basis and x _L Zsbutx ^ 0, for example, ||x|| = 1. Hence, {x}U 
E is an orthonormal set contradicting the assumption that £ is a basis. Conversely, 
suppose x _L E implies x = 0 but E is not a basis. Let F D E be a basis for H, so 
that there exists xe£\£, x f 0. However, x _L E so that x = 0 which leads to 
contradiction. A 

Theorem B.39 An orthonormal set E — { e, } ; € / in a Hilbert space H is a basis if 
and only if the linear space Hq spanned by it coincides with H. 

Proof Suppose £ is a basis for H. If Hq C H, there exists a non-zero element in 
H \ Hq that is orthogonal on £, in which case £ is not a basis, in contradiction to 
our assumption. Suppose now that Hq — H and £ is not a basis. Then, there exists 
an orthonormal basis £ D £. Take y e F \ £ with ||y|| = 1. Since y e H — Hq, it 
is an accumulation point for finite linear combinations of elements from £. Let {z„} 
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be such a sequence converging toy. We have {z„, y) — 0 implying (y, y) = ||y|| 2 = 0 
by taking the limit as n — > oo, so that E is an orthonormal basis for H. A 

Theorem B.40 An orthonormal set E — [e , },- e / in a Hilbert space H is a basis for 
H if and only if every x e H admits the Fourier representation 

x = ej)ej, xeH. (B.23) 

i S I 

Proof If £ is a basis, we have Ho — H by the previous theorem, where Hq is the 
linear space spanned by E. We have shown that the series Z/ g r c,e;, c, = (x, ef), 
is convergent with limit x' and that x — x' _L E, so that x — x' _L Ho — H. Since 
x — x' e H, we have (x — x' , x — x') — 0, so that x' —x. Conversely, suppose the 
representation x = JT 6 7 c, e; in (B.23) holds for all x e H and E is not a basis. Then 
Hq C H so that there exists x e II with unit norm such that (x, ef — 0. i e I . This 
observation and the representation of x imply x = 0. Hence, E must be a basis. A 

Theorem B.41 Let [e n } be an orthonormal sequence in a Hilbert space H. The 
following statements are equivalent: (1) { e n } is an orthonormal basis in H, (2) the 
equation (x, y) = J^ )I = j (x, e„)(y, e n ) holds for all x, y e H, (3) the norm satisfies 
Parseval’s identity in (B.19), that is, ||x|| 2 = 2I/7L t I ( x ’ e n)\ 2 holds for all x e H, 
and (4) the sequence {e n } is a total set, that is, (x , e n ) = 0 for all n > 1 implies x = 0 
([14], Theorem 3.4.10). 

Example B.33 The set of orthonormal functions 

(\/V2tc , cos(t) / s/rt , sin(f) / \fft , . . . , cos {nt)/y/n, sin (nt)/*fjz, . . .) (B.24) 

defines a basis for L 2 [— jt, jt] so that we have the representation 

oo 

x(t)—ao + ^ (a„ cos(nf) / y/n + j3 n sm(nt)/^/n ) , x e L 2 [— tr, jt], (B.25) 

n — I 

with coefficients ao = f* x(t) dt/*j2n, a n = ff x(t) cos (nt)dt and j3 n = 
f- n x (t) sin(nt)dt n > 1 by (B.23). Let x(t) — t and y(t) — 1, r e [— tc, 7r], 
be two members of L 2 [— jt, tt], so that they admit representations of the type 
in (B.25). The coefficients of these representations are a n = 0, n > 0, and 
f n = — 2y/rc{—\) n /n , n > 1, forx(t) and ao = sFltt , a n —f n =0, n > l,fory(f), 
so that x(t) = —2 i( — 1)" sin (nt)/n and y(t ) = 1. The representations of x{t) 

and y(t) are consistent with Theorem B.41. For example, (x, y) = f* (f)(1) dt — 0 
by direct calculations and ]T j( (x, e„) (y, e„) — (x, \/y/2n){y, 1/ y/2jt) = 0. Also, 
lkll 2 = J- 7[ t2 dt = 2iT 2 /3 and l(^,e«)l 2 = Z5T=i IA(| 2 =47r Z,T=i( 1 /« 2 ) = 
(47r)(7r 2 /6) = 27T 3 /3. O 

Theorem B.42 The orthogonal complement W 1 of a subset W 0, that is, the 
set W = {yeff : (y, x) = 0 /or o// x e W} fv a closed linear subspace ofH ([14], 
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Theorem 3.5.5). IfW\ and W 2 are closed linear subspaces of a Hilbert space H such 
that W 1 _L W 2 , then (1) the representation x = vi + in Wi © W 2 , yk e Wk, is 
unique and (2) W\ © W 2 is a closed subspace of H ([14], Lemma 3.5.8). If W is a 
closed linear subspace of a Hilbert space H, then H =W © W^, a property referred 
to as the projection theorem ([14], Theorem 3.5.9). 


B.4.2 Linear Functionals 

This section deals with functionals, that is, mappings from vector spaces to the fields 
underlying these spaces, which for our discussion are R and C. We also consider 
functionals defined on products of vector spaces. 

Definition B.32 A functional tp : V — ► R, C is said to be linear if it satisfies the 
conditions in Definition B.26. The functional is bounded if there is a constant c > 0 
such that \(p(x)\ < c||x|| for all x eV. 

Definition B.33 A mapping b : V x V — » R, C is said to be a bilinear form if it is 
linear and anti-linear in the first and the second arguments, respectively, that is, 

b(a\x\ + CH 2 X 2 , y) =aib(xi, y) + azb(x 2 , y) 

b(x, p m + p 2 y 2 )=P*b(x, yi) + ftb(x, y 2 j (B.26) 

forx, xi, X 2 , y, yi, y 2 e V and arbitrary scalars o?i , 012 , Pi, and /12- 
Definition B.34 A bilinear form is bounded if there is a constant c > Osuch that 

\b(x, y)| < c||x||||y||, Vx, y e V. (B.27) 

Example B. 34 The inner product is a bounded bilinear form. The conditions in (B .26) 
are satisfied by the defining properties of the inner product (B.15). For example, 
(x,Piyi + p 2 y 2 ) = {Pm + P 2 y 2 , x)* = (Pi(yi,x) + p 2 {y 2 ,x))* = P*{x,yi) + 
P 2 ( x > y 2 ) hold by conjugate symmetry and linearity in the first argument. The condi- 
tion in (B.27) with c = 1 follows from the Cauchy-Schwarz inequality. O 

Definition B.35 A bilinear form is positive definite, coercive, or elliptical if there 
exists a constant c' > 0 such that 

b(x,x) > c'||x|| 2 , VxeV. (B.28) 

Theorem B.43 (Riesz’s representation theorem) If q> : H — »■ C is a continuous 
linear functional and H is a Hilbert space, then there exists a unique vector z € H 
such that cp(x) = (x, z) for all x e H . 

Proof The geometric interpretation of the theorem is that cp(x ) can be viewed 
as the projection (x,z) of x on a unique element z e H for all xeH. More- 
over, II^H = sup||. c ||< 1 \(p{x)\ = supii^ii^ | (x, z)| < ||z|| by the Cauchy-Schwarz 
inequality. 
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We now prove Riesz’s theorem. The kernel N — [x e H : <p(x) = 0} of <p is 
a linear subspace in H by the linearity of <p. The subspace is closed since, if 
{x n } c N converges to a limit x e H, then <p(x) — lim,,-^ < p{x n ) = 0 by the conti- 
nuity of (p, so that x e N. If N = H, the proof is completed. Otherwise, we have 
H = N © N 2 -. Take y e N 2 - such that ||y|| > 0 so that cpiy) ^ 0. For any x e H, 
we have <p(x — y<p(x)/<p(y )) = 0. This implies £ = x — y< p(x)/(p(y) e N, so that 
x = y<p(x)/<p(y) + i- and (x, y) = \\y\\ 2 (p{x)/(p(y) or <p(x) = (x, z) for all x e //, 
where Z = y<p{y)* /\\y\\ 2 . The vector z is unique. If z! ^ z has the same proper- 
ties as z, we will have 0= (x, z) — {x, z!) = (x, z — z!) implying ||z — z'|| =0 for 
x = z — z', so that z = z! ■ A 

Example B.35 A functional (p : L 2 [a,b] — > C is bounded and linear if and 
only if there exists y e L 2 [a , b ] such that (p(x) = x(t)y(t)* dt — (x, y) for all 

x e L 2 [a, b]. The first part of this statement follows from the Riesz theorem since 
a linear operator is continuous if and only if it is bounded (Theorem B.23). The 
converse follows from the observations that q> is linear with norm ||^>|| = sup|| v || <1 | 
fa x(t)y(t)* dt\ so that ||^|| < ||x||||y|| < ||y|| by the Cauchy-Schwarz inequality. 
Hence, for x ^ 0, we have |^>(x/||x||)| < ||^|| < ||y||, which gives \(p{x)\ < 

Iljcllllyll-O 

Theorem B.44 (Lax-Milgram theorem)) If b : II /, ! I — » R, C is a positive 
definite, bounded bilinear form and H is a Hilbert space, then for every bounded 
linear functional (p : H —*■ K, C there exists a unique y v such that cp(x) = b(x, y v ) 
forallxeH([ 18], Sect. 3.1.3). 

Proof Riesz’s representation theorem implies <p(x) = (x, z^) for all x e H and a 
unique z v ^H depending on <p. The Riesz theorem also implies that, for an arbi- 
trary but fixed y e H, we have h (x , y) = (x, z y ) for al I x e // and a unique z y e H 
depending on y since /?(■, y) is a bounded linear functional. The coercivity of b 
is needed to show that the image T(H) of the operator T : H —> H defined by 
T (y ) = z y is dense in H so that any member of H is a limit of a sequence in Till). Let 
{Z(p, n } C 7(7/) be a sequence converging to z^, e //andset{)y„ = T~ l (z v >, n )} C H. 
The idea is to show that b(x, y v , n ) converges to b(x, y v = lim„^oo y<p, n ) and that 
b(x, ytp) = (x, Zw), that is, b(x, y v ) = cp{x) for all x e H . For a complete proof see, 
for example, [18] (Sect. 3.1.3) or [19] (Theorem 13. p. 166). ▲ 


If. 4.3 Weak Convergence 


Definition B.36 A sequence [x„ } in a Hilbert space H is weakly convergent with 
weak limit x, if for all ye//, the sequence of real or complex numbers (x„, y) is 
convergent with limit (x, y), that is, lim„^ 00 (x„, y) = (x, y) for all ye//. 

Theorem B.45 If x n —> x weakly, then the weak limit x is unique. 
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Proof Suppose {*„} has two limits, x and *', that is, ( x„,y ) -> (*, y) and {*„, y) -y 
{*', y) for all y e H as n — > oo. Since limits are unique in R. and C, we have 
{. x , y) = {*', y) or (x — x' , y) = 0 for all y e H, that is, x — x' e H 2 - = {0}. A 

Theorem B.46 Let {x n } be a sequence in a Hilbert space H. If x n — > x strongly, 
that is, \\x n — jc || — > 0 as n —y oo, then x n —y x weakly. 

Proof We have \{x n , y) - {*, y}| = \{x n - x, v}| < \\x n - *||||y|| for all yeH by 
properties of the inner product and the Cauchy-Schwarz inequality. This implies 
(. x n , y) -y (x, y) for all y e V as n -y oo. A 

Theorem B.47 Every weakly convergent sequence in a finite dimensional Hilbert 
space is also strongly convergent. 

Proof Let (ei , . . . , e*) be an orthonormal basis for an ^-dimensional Hilbert space H, 
and let {*,, } be a sequence in H. Sincex„= i ( x «, e i) e i andx= t ( x , e i)ei, 
we have ||x — x n \\ 2 = X/= l !(*> e i) ~ ( x n , e,-)| 2 -*■ 0 as n -* oo since x n -y x 
weakly. A 

Theorem B.48 Every weakly convergent sequence in a Hilbert space is bounded 
([14], Theorem 3.6.7). 


BAA Bounden Linear Operators 

Let T be a member of B(H,H), that is, 7' : H -y H is a bounded linear operator and 
H is a Hilbert space (Definition B.27). For simplicity, we denote B(H, H) by PAH). 
The review in this section is based on [14] (Chap. 4). 

Definition B.37 If for T e B(H) there exists a linear bounded operator T e B(H) 
such that (Tx, y) = (x, 7 y) for all x, y e H, then T is called the adjoint operator of 
T. The existence of T is guaranteed by the following theorem. 

Theorem B.49 If I e B(H), then T is a bounded linear operator. 

Proof The linearity of T follows from its definition and properties of the inner 
product. The operator is bounded since \\T x\\ 2 = (T x , T x) — (T (T x) , x) < ||7’(7 , x)|| 
11*11 < || T || || Tx || ||x || by the definition of T, the Cauchy-Schwarz inequality, and 
properties of the norm of T. We have ||r*|| < ||7’||||x|| for all x e H by dividing 
with 117*11 / 0 (the case ||7*|| =0 is trivial). The latter inequality shows that T is 
bounded and that ||7j| < ||7||. A 

Definition B.38 An adjoint operator T e B ( H ) is self-adjoint if 7 = 7. 

Theorem B.50 The norms of the operators 7 and 7 have the properties || 7 1| = || 7 1| 
and || 77|| = ||7|| 2 . 

Proof Arguments similar to those used to prove Theorem B.49 give ||7|| < 
|| T" || so that ||7|| = ||7||. Also, ||7*|| 2 = (7*, Tx) = (*, 7(7*)) < ||*||||77||||*||, 
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that is equal to ||7T|| for ||x|| = l implying ||r|| 2 < ||7T||. Since ||7T|| < 
||f'||||r|| = || T’H 2 , we have ||7T|| = ||r|| 2 . A 

Theorem B.51 For any S,T e B ( // ) , the adjoint ofS T is T S and f = T , where T 
denotes the adjoint ofT. 

Proof These statements follow from the definition of the adjoint operator and prop- 
erties of the inner product. For example, (Tx, y) = (x, Ty) — (Ty, x)* — (y, Tx)* = 
(Tx, y), so that T = T since x, y e H are arbitrary. A 

Theorem B.52 Every operator T e B (H) on a Hilbert space H induces the partition 
H = ker(f) © T(H), where ker(f ) = {x e H : Tx = 0}. 

Proof We have H — T(H ) © T(H )-*- by the projection theorem since the closure 
T ( H ) of T(H) is a closed linear subspace. It remains to show that T (//)-*- = ker ( 7 ) . 
For y e T (//)-*-, we have (x, ty) = (Tx,y) — 0 for all x e H implying T (//)-*- C 
ker(f’). If y e ker(f), we have (Tx, y) = (x, T y) — (x, 0) = 0 so that ker(T) C 
T(H )-L. A 

Theorem B.53 IfT e B(H) is self-adjoint, then (Tx, x) e R. The norm ofT can be 
calculated from || r|| = sup{ (Tx, x) : x e H, ||x|| = 1}. 

Proof The first statement holds since (Tx,x) = (x, T x) — (T x , x)* . A proof of the 
second part can be found in [14] (Theorem 4.1.1 1 and Assertion 4.1.12). A 

Theorem B.54 If I e B(H) is self-adjoint and T(H) is dense in H, then T has an 
inverse operator defined on the image T(H) ofT. 

Proof We have H— ker (7 ) © T(H)* by Theorem B.52 implying ker(7') = {0} 
since T(H) is dense in H by assumption. Since T is self-adjoint, we have ker (7’) = 
ker(7’) = {0} so that T is injective, that is, x ^ x' implies Tx ^ Tx 1 , so that 
T~ l : T(H) — ► H is well defined. A 

Definition B.39 Let V and W be normed linear spaces. A bounded linear operator 
T : V — »■ W , that is, a member of B(V,W), is said to be a compact or completely 
continuous operator if it maps any bounded subset A in V into a subset T (A) of W 
with compact closure T (A) in W. 

Definition B.40 An operator T e B ( V , W) is said to he of finite rank if the dimension 
of T(V), that is, the dimension of the linear subspace generated by the image of T in 
W, is finite. 

Theorem B.55 A bounded linear operator of finite rank is a compact operator. 

Proof If A is a bounded subset of V, then T(A) is a bounded subset in W. Moreover, 
T(A) C T(V) since T(V) is finite dimensional. Since T(A) is bounded and closed 
in a finite dimensional space, it is compact. A 

Theorem B.56 Let H be an infinite dimensional, separable Hilbert space and 
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Tx= a n (x, e„)e n , xeH, (B.29) 

n — 1 

where { e n } is an orthonormal basis for H and {a n } are complex numbers. If the set 
{a,,} is bounded, the series in (B.29) is convergent and defines a linear operator T 
on H. Moreover, T exists and is bounded if and only if{a n ] is bounded. 

Proof For the first part of the proof, we use the following comparison test. If H is 
a Banach space and {y,,} C H is such that 1 1 1 1 < a n , n > no, and a n is 

convergent, then the sum t >'« i s convergent. The test results from the inequality 
II Xl- = h+i yk II — TIJn+i ak ' n — n 0’ that holds by the triangle inequality and the 
fact that R. and H are complete spaces. 

The series XiTL l ( x > e n) e n i s convergent by Bessel’s inequality for all x e H. The 
comparison test implies that Tx = X«°= t a " ( x > e„)e„ is convergent for all x e H if 
{a,,} is bounded. If the series is convergent, then Tin (B.29) is defined and represents 
a linear operator on H. 

We have || T x\\ 2 = X/TL i |a«| 2 | (x, e n )\ 2 , || Te n \\ = \a n \ for all n e N, and ||x|| 2 = 
X“ t I (jc, e„)| 2 , so that || T’.c || 2 < ||x|| 2 X f TLi |a„ | 2 . Hence, Texists and is bounded 
if and only if the sequence {a n } is bounded. It can also be shown that T exists and is 
compact if and only if lim ; ,^oo a n = 0 ([14], Theorem 4.2.1 1). A 

Example B.36 Let T : V — * V be a matrix operator with entries { t/j el], 
i, j — , n, defined by x (->• x' — Tx, where x- = X/ = t UjXj , i = 1, . . . , n, 

and V denotes the space of real-valued (n, n )-matrices. The operator is linear and 
bounded so that is a member of B(V). O 

Example B. 37 Let T : C[0, 1] -> C[0, 1] defined by Tx(t)= k(t, s)x(s) ds, 
where the kernel k(-, ■) is continuous in both arguments. Then 7’ is linear and bounded, 
since k and x are continuous functions on bounded intervals and the integration 
domain is bounded. O 

Example B.38 Let T : C 2 [ 0, oo) — > C 2 [0, oo) be an operator with domain the 
space of continuous real- valued functions with continuous first and second derivatives 
defined by Tx(t) = x" (t), t e [0, oo). The operator is linear but is not bounded since, 
for example, T x{t) — 2 for x(t) — t 2 so that there is no M > 0 such that ||7’.rj < 
M || jc || for all x e C 2 [0, oo). O 

Example B.39 The operators in Examples B.36 and B.37 are adjoint. For T in 
Example B.36 we have {Tx, y) = ]T/ (X/ UjX^yi = x j{Hi hjyi) = {x,T y), 
where Ty = JL tijyi . Hence, the matrix representing T is the transpose of the matrix 
representing T. For T in Example B.37 we have (Tx, y) — J ( ] ( J ( j k(s, t)x(s)ds) 
y(t ) dt that becomes jc(5)( J ( j k(s, t)y(t ) dt) ds — (x, T y) by changing the order 
of integration, where Ty(s) = J 0 k(s, t)y(t) dt. O 


Appendix B 


501 


B.4.5 Spectral Theory 

Definition B.41 Let T : H — > H be a linear operator, where H is a Hilbert space. 
The resolvent set p(T) for T is a subset of complex numbers {£ eC] such that 
(£/ — T)~ l exists as a bounded linear operator on a subspace Hq of H dense in H. 
The spectrum er (T) for T is the complement of p(T), that is o(T) — C \ p(T), and 
has three distinct components. The point spectrum <J p (T) is the subset of a (T) for 
which t,I — T cannot be inverted, so that it consists of the eigenvalues of T. The 
continuous spectrum o c (T) is the subset of a(T) with the property (£1— T)~ 1 exists 
as a densely defined and unbounded operator on H, and the residual spectrum ay (7’) 
is the subset of a(T) such that (£1 — T)~ 1 exists and is continuous, but its domain 
of definition is not dense in H ([14], 107 Chapter 5 and [15], Section IV.7). 

Example B.40 Let V denote the space of continuous functions x : [0, 1] — > R 
with continuous first order derivative and piecewise continuous second order deriv- 
ative such that x(0) = x(l) = 0 and let 7’ : V — > V be the linear operator Tx(t) = 
— d 2 x{t)/dt 2 . The general solution of Tx — Xx has the form x(t) = a sin(\/Xr) + 
/I cos (s/Xt), where a, fi are arbitrary constants. The boundary condition x(0) = 0 
implies f — 0 so that x(t) — a sin(\/Xf). The boundary condition x(l) = 0 imposes 
the condition a sin(V^) = 0. The option a = 0 is inadmissible since it corresponds to 
the trivial solution x(f) = 0, t e [0. I ], so that we require sin(\/X) = 0, which implies 
X — X„ = (niT) 2 , n = 1, 2, . . ., and x„(f) = sin(njrf), n— 1,2,..., t e [0,1]. The 
constants {/,„ } and the functions ]x„ (t) \ are the eigenvalues and the eigenfunctions of 
T. We also note that the eigenfunctions provide a basis for the set of square integrable 
functions defined on [0,1] ([20], Sect. 2.2). O 

Example B.41 Let T = — d 2 x(t)/dt 2 be as in Example B.40, but assume that V is 
the vector space of continuous functions x : [0, oo) — * R with continuous first order 
derivative and piecewise continuous second order derivative that are square integrable 
such that x(0) = 0. The boundary condition x(0) = 0 implies x(t) = a sin (VXt) so 
that 7’ has no eigenfunctions since x(t) 2 dt — oo. To overcome this difficulty, we 
extend the classical definitions of eigenvalues and eigenfunctions. 

We say that X is a continuum eigenvalue for T if there exists a sequence of functions 
x„(t) such that lim n ^oo |(T — a)x„|/|x„| =0; X with this property belongs to the 
continuous spectrum of T. If the sequence x„ converges pointwise to a function x, 
then x is called a continuum eigenfunction of T ([20], Sect. 2.2). Since for X > 0 and 
t n — (2 n + 1/2)7 rl -1 / 2 the sequence x„(f) = sin(V/U)l(0 < t < t n ) + (l — 2 (t — 
< t < tn + 1/2) + 2{t - t n - l) 2 l(t„ + 1/2 < t < t n + 1) converges 
pointwise to x(f) = sin(VXr), t > 0, as n — »■ oo and limH-^oo \(T — A,)x„|/|x n | = 0, 
the continuum eigenvalues and eigenfunctions are X > 0 and sin(\/Xf ), t > 0. O 

Theorem B.57 IfH is a Hilbert space and T e B(H), then the resolvent set p(T ) is 
open in C and the spectrum set o{T) is closed in C. Moreover, <j(T) is compact and 
included in the ball of radius ||T|| centered at 0 ([14], Theorems 5.1.7 and 5.1.8). 
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Theorem B.58 The eigenvalues of a self-adjoint linear operator T on a Hilbert space 
H are real-valued. Moreover, eigenvectors of distinct eigenvalues are orthogonal. 

Proof Let X e C be an eigenvalue of T and x / 0 an eigenvector corresponding 
to this eigenvalue, that is, Tx = Xx, so that X(x, x) = ( Tx , x) — (x, Tx) = X*(x, x) 
by properties of inner product and the definition of self-adjoint operators. Since 
|| jr || ^ 0, we have X = X*. 

Let X, p be distinct eigenvalues of T and x,y eigenvectors corresponding to 
these eigenvalues, so that x, y ^ 0. We have X(x, y) = (Tx, y) =(x, Ty) = p(x, y) 
implying (X — p)(x, y) — 0. Since X ^ p, we have (x,y) = 0. A 

Example B.42 If T — { tjj } is a real-valued symmetric matrix, then Lis a self-adjoint 
operator and its eigenvalues are real-valued by Theorem B.58. This fact can be 
establish by direct arguments. O 

Theorem B.59 If T is a compact, self-adjoint linear operator defined on a sepa- 
rable Hilbert space H, then H admits an orthonormal basis {e,,} consisting of 
the eigenvectors of T, and T has the representation Tx — y]2°_ i X n (x, e n )e n for 
x = l ( x > e n)e n , where {A,,} denote the eigenvalues ofT ([14], Theorem 5.2.8). 

Theorem B.60 If T is a compact linear operator on a Hilbert space H and X f 0 
is an eigenvalue of T, the space H\ — [x e H : Tx — Xx] is a finite dimensional 
subspace ofH. 

Proof H\ is a linear space since it is closed under addition and scalar multiplication. 
Suppose H, is an infinite- dimensional space, and let \x n } be an orthonormal sequence 
in Hx- For m ^ n, we have ||rx m — T x n \\ 2 = \X\ 2 \\x m — x n \\ 2 — 2\X\ 2 > 0, where 
the latter equality holds by properties of ]x„ } . This implies that {Tx,,} does not have 
any convergent subsequence, so that T cannot be compact. A 

We have seen that the series in (B.29) converges and defines a linear operator 
T if the sequence { a n } is bounded, the eigenvectors of T corresponding to distinct 
eigenvalues are orthogonal if T is self-adjoint, and the eigenvectors of a self-adjoint 
operator T corresponding to the same eigenvalue define a linear subspace. 

Example B.43 The linear space Hx in Theorem B.60 associated with an eigenvalue 
X of a linear compact operator T e B(H ) admits an orthogonal basis in H, . O 

Proof Let xi,X 2 , ... be the eigenvectors of T corresponding to an eigenvalue 
X of this operator. Since Tx \ = Xx\ , T xi_ — Xxj , . . ., we have T (a\x\ + 012 x 2 + 

• • - ) = X(aix\ + a 2 X 2 H — • ), that is, any member of H, is an eigenvector of T corre- 
sponding to X (see also [20], Sect. 2.2). The Gram-Schmidt algorithm can be used 
to map linearly xi, X 2 , ... into a set of orthonormal vectors. A 
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B.5 L p Spaces 

Let (Q, ■"¥ , P) be a probability space, p > 1, and 2$ the Borel er-field on R. Denote 
by the set of all real- valued measurable functions X : (Q, .¥) — > (M, M) that are 
/>integrable, that is, C & and £[|X| P ] = f \X\ p dP < oo. We partition 

P£ p in classes of equivalence consisting of those members of T£ p that differ on 
events of probability 0, and denote by L P (Q, JF, P) or L p the collection of these 
classes of equivalence. We will not distinguish between the members of a class of 
equivalence. A similar construction holds for M, M^, C, and C^-valued measurable 
functions. 

Theorem B.61 The space L p (Q, ,'Y r , P) is a vector space over the field of real or 
complex numbers. 

Proof Let a e M, C and X, Y e L p . The functions aX and X + Y are measurable 
since scalar multiplication and addition are continuous mappings. The functions \aX\ 
andX+T are />integrable since \aX\ p = \a\ p \X\ p , and |X+T| P < (|X| + |T|) /? < 
2 p max{\X\ p , \Y\ P } < 2 P {\X\ P + \Y\ P ), so that \aX\ p , \X + Y\ p e L 1 . A 


B.5.1 Useful Inequalities 


Let p > 1 and define q > 1 by the relationship 1 / p + l/q = l. A notable property 
of these numbers is the inequality 

a p b q 

ab < 1 , a,b> 0, (B.30) 

P q 

that follows from tj p ~ 1 dt; + Jq q q ~ l di] > ab, which results by comparing 
the areas under the graphs of function q(f) = ^P~' , £ > 0, and its inverse 

£(»;) = — r) q ~ l , q > 0. 

Theorem B.62 IfXeL 1 ’ and Y e L q with 1 / p + 1 /q= 1, then XY el 1 . 

Proof Since |XT| = |X| \Y | and |X|, |T| > 0, (B.30) with (|X|, |F|) in place of ( a,b ) 
gives \XY\ < \X\ p /p+ \Y\ q /q e L 1 . ▲ 

Theorem B.63 IfX e L p and Y e L q , then 


J \XY\dP < 


\X\ p dP^j /P (J \Y\ q dP ) n 


(Holder’s inequality). (B.31) 


Proof Note first that £[|XT|] < {E[\X\ p f^^ p {E[\Y\ qi ^^ q is an alternative form 
of (B.31). Also, Holder’s inequality for p = q — 2 gives 


/ \ 1/2 

£[|XT|] < (£[|X| 2 ])(£[|T| 2 ]) (Cauchy-Schwarz inequality). (B.32) 
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The inequality (B.30) with a = |X|/(J \X\PdP) l/p and b= \Y\/(J \Y\ q d P) l/q 
gives 

\XY\ < (l/p)\X\ p ( J \X\ p dP) l/p ~\ j \Y\ q dP) l,q 

+ (l/q)\Y\ q ( j \Y\ q dP) 1,q ~ l ( j \X\ p dP) Vp , 


so that / \XY\dP < (l/p+ l/q)(f \X\ p dP) 1 ^ p (f \Y\ q dP) by integration, that 
is, (B.31) since 1 /p + l/q = l. A 

Theorem B.64 If X, Y e L p and p > 1, then 


(/|x + y |W )'"<(/ \X\ p dP r<j \Y\ p dP^j 


\ i/p 


(Minkowski’s inequality). 


(B.33) 


Proof We have seen that X + Y e L p ( Theorem B. 61), so that the left side of (B.33) is 
meaningful. Note also that X e L p implies \X\ p ~ l e L q since (|X|^ -1 ) 9 = \X\ P e L l . 
This property and the Holder inequality applied to the right side of the inequality 
\X + Y\ P = \X + Y\ P ~ 1 \X + Y\ < \X + Y\ p ~ l \X\ + \X + Y\ p ~ l \Y\ gives (B.33). For 
example, since X + Y e L p we have \X + Y\ p ~ l e L q so that J \X + Y\ p ~ l \X\dP < 
(/ |X| p dP) l ^ p [J \X + Y by Holder’s inequality. Similar considerations for 
the term \X + Y\ p ~ l |T| yield Minkowski’s inequality. A 


B.5.2 L p Spaces as Normed Spaces 

We have seen that L p is a linear space. It is shown that it is possible to define norms 
on L p and an inner product on L 2 . 

Theorem B.65 The function X m- ||X|| p , XeL p , p > l, defined by 

m p = (J | X\ p dp\ /P = (E[\X\P]) l/p (B.34) 


is a norm on L p . 

Proof We have ||X||p > 0 and ||X||p = 0 if X — 0 P-a.s. and ||aX||p = |a| ||Z|| p by 
(B.34). The triangle inequality ||Z+T|| p < ||Z|| p + ||F||p follows from Minkowksi’s 
inequality. Hence, ( L p , || • || p ) is a normed space. A 

Theorem B.66 The normed space (L p . || • ||p) is complete. Moreover, the collection 
of real/complex-valued, simple ^-measurable functions I p is dense in L p , so that 
any member of L p can be represented by limits of sequences from I p ([15], Sects. 
10.2.1 and 10.2.2, [21], Theorem 5.1.1). 
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Definition B.42 Let X be a real-valued measurable function on (Q, JP, P ). The 
essential supremum of X is 

ess sup(X) = inf{o e R. : |X| < a a.s.}. (B.35) 

If there exists M > 0 such that \X\ < M a.s., we say that X is essentially bounded. 
Theorem B.67 Let X be a real-valued measurable function on (Q, , P ). Then 


XHoo = ess sup(Z) 


(B.36) 


is a norm on the space L°° of real-valued measurable functions on (£2, JP , P ) that 
are finite a.s. 

Proof Since |X + y| < |X| + |T| and \aX\ = |a||X|, we have ||X+y||oo < 11^1100 + 
|| T || oo and ||aX||oo = l« I Halloo for any X, Y e L°° and scalar a. For example, (B.35) 
and (B.36) give |X| < ||X||oo and | y | < || y || oo a.s., so that |X| + |y| < ||X||oo + 
|| y || oo a.s. implying that ||X||oo-|-||y ||oo belongs to the set {a e R. : |X| + |y| < a a.s.} 
Hence, L°° is a vector space. Since H^Hoo = 0 holds if and only if X — 0 a.s., we 
conclude that || • ||oo is a norm on L°°. A 

Theorem B.68 If X is p-integrable, then 

lim ( [ \X\ p dP 

p^oo \J 

Proof Let ess sup(Z) = M and note that J p = ( / |X | p dPj ^ p < M. If M < oo, we 
have J p > (f Ae \X\ p dP) l/p >(M - e)P(A e ) x l p , where A £ = {®ef2 : |X(ro)| 
> M — e}. This implies J p > M — e for a sufficiently large p. Hence, for such p 
we have M — e < J p < M implying lim /;l _ ) . 00 J p = M. Suppose now M — oo. Then 
| X\ is not bounded on £2. Take £ > 0 such that Af = {co e £2 : | X(a>) | > £} has non- 
zero measure, that is, P(A^)> 0. Then J p > ( f A \X\ p dP) l ^ p > %P(A%) x / p so 
that J p > § for a sufficiently large p implying J p > | so that lim^oo J p — oo = M 
since £ is arbitrary ([15], Sect. 10.2). A 

Theorem B.69 The function (•, ■) : l 1 x L 2 — > R defined by 


= ess sup(X). 


(B.37) 


{X, Y) = 


XYdP — E[XY], X,YeL 2 


(B.38) 


is an inner product on L 2 . 

Proof That (-, •) satisfies the conditions in (B.26) follows from properties of expec- 
tation. The definition is meaningful since X,YeL 2 , so that XY el 1 by the Cauchy- 
Schwarz inequality or Theorem B.62. The norm X v-? ||X|| = (E\X 2 f) l/_ induced 
by this inner product coincides with that in (B.34) for p — 2. Similar considerations 
hold for complex-valued random variables. For C-valued random variables, the inner 
product is (X, Y) = E[XY*] and ||X|| = E[ \X\ 2 ]. A 
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B.6 Orthogonal Polynomials 

Let C 2 (D) — {/ : D — > 1R, D C R rf } be the set of real-valued continuous functions 
defined on a subset D of that have continuous first and second partial derivatives. 
We define on this linear space the inner products 


{f,g)= [ f (x)g(x) dx and 
J D 

(/. g)w = f w(x)f(x)g(x)dx, (B.39) 

JD 

where w : D — »■ R is a positive weighting function. 

For d = 1 and D — [a, b], consider the Sturm-Liouville differential equation 

(pMy'(x))' - q(x)y(x ) + Xw(x)y(x) = 0 (B. 40) 

satisfying the boundary conditions 

u\ y(a) + oi 2 y\a) + a?, y(b) + a\ y'(a) = 0 

Piy(a) + fhy\a) + foy{b) + fay’ {a ) = 0, (B.41) 


where p', p, q,w e C°[a, b], p(x) > 0, and (a*, fa), k= 1, .... 4, are constants. 
An alternative form of (B.40) is Jzfy = Xsrfy, where ££y — — (py') + qy and 
^/y — wy are operators defined on C 2 [a, b]. That «Sf is self-adjoint results from the 
definition of the inner product. Since 

rb rb 

{Jtfy,z)=— (p(x)y'(x))'z(x)dx+ q(x)y(x)z(x) dx 
J a J a 

(b 

= - p(x)y'(x)z(x) \ b + (p(x)y\x)z'(x) + q(x)y(x)z(x)] dx 

J a 
rb 

{y, &z) = - p{x)z! (x)y(x) \ b + (p(x)y'(x)z'(x) + q(x)y(x)z(x)) dx, 

J a 

(B.42) 

we have ( 5£y, z) = {y, under the condition p(x)y' (x)z(x) \ b a = p(x)z\x) 
y(x) \ b a , which holds if the boundary conditions in (B.41) are replaced with, for 
example, y(a) = y(b) = 0. In this case, the eigenvalues of 5£y = Xjz/y are real- 
valued and the eigenfunctions of this equation corresponding to distinct eigenvalues 
are orthogonal (Theorem B.58). 

Consider (B.40) and (B.41) with p{x) = w(x)/3(x), w'(x)/w(x) —a(x)//3(x), 
w(x)/3(x ) \ b =0, a(x) = ao + a\x, fax) = bo + b\x + b 2 X 2 , «o, a\, bo, b\, bi 
constants, q{x) — 0, and y(a), y(b) finite. It can be shown that the eigenvalues and 
the eigenfunctions of the resulting Sturm-Liouville equation ( w(x)fax)y’ (x )) + 
Xw(x)y(x) = 0 are X n = — n[(n + 1 )Z ?2 + fli) and orthogonal polynomials y n of 
degree n= 1,2,..., for example. 
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Jacobi: [a, b] = [— 1, 1], w(x) = (1 — x) r (l + x) s , /3(x) = 1 — x 2 , 
where r and s are constants, 

Legendre: Jacobi for r = s = 0, 

Chebyshev: Jacobi for r = s = 1/2, 

Laguerre: [a, b ] i->- [0, oo), w(x) — x^e~ x , t; > — 1, and 

_ 2 

Hermite: [a, b] i->- (— oo, oo), w(x) = e A . (B.43) 

We limit our discussion to Hermite polynomials and their use in the representation 
of random elements with finite variance. Additional information on orthogonal poly- 
nomials including the Askey scheme can be found elsewhere ([1], Sect.4.1, [22], 
Chap. 1) 


B.6.1 Hermite Polynomials 


The function exp(— {x — a) 2 / 2) of a e R. for a fixed x e R has the Taylor expansion 


e -(a-x) 2 l 2 


Z<-D 


n a n d n (e~ x2 l 2 ) 


dx n 


(B.44) 


about a — 0, so that 


00 n" / 

(<- 


^ax—a l /2 

n = 0 

Definition B.43 The polynomial 




dx n 


(B.45) 


H n (x) = (- 1)V 


n^, 2 d n (e- x2 ' 2 ) 


dx n 


(B.46) 


is called the Hermite polynomial of degree n. For example, the Hermite polyno- 
mials of degrees n — 0, 1 , 2, and 3 are //o(x) = 1 , H\(x)=x , hhix) = v 2 — I , and 
Hi(x) — x 3 — 3x. Note that e ax ~ a ~l 2 in (B.45) can be written in the form 


fi 

e a*-a 1 l2 = Y H n (x). (B.47) 

A ' n\ 

n = 0 

Let dr](x) = {2n)~ { ^ 2 exp(— jc 2 /2) dx — d<t>(x) — <fi(x)dx,x e K, be the 
Gaussian measure with mean 0 and variance 1, and denote by L 2 (R, drj(x)) 
the Hilbert space of real-valued functions defined on the real line that are square 
integrable with respect to r], that is, f (x) 2 dr](x) < oo for / e L 2 (R, dii(x)). 
As previously, 38 is a Borel a -field on R. 
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Theorem B.70 The polynomials H n (x), n > 0, are orthogonal in L 2 (R,SB, 
drj(x)). Moreover, E[H m (X)H n (F)] = n\p n 8 mn , where (X,F) are standard Gaussian 
variables with correlation coefficient p. 

Proof The formula in B.47 with a — u and a=v gives 

fit n 

e (u+v)*— (« 2 +v 2 )/2 _ y m, veK, 

z — ' m\n\ 

m,n = 0 


by multiplication. The expectation of the left side of this equation, J ™ exp((i< + 
v)x — (m 2 + v 2 ) /2 )d rj(x)= exp (u v), is a function of u v and so must be the expectation 
of its right side. The latter integration can be performed term by term by dominated 
convergence, so that we must have H m (x)H n (x)drj( x) — 0 for all m f n. This 

shows that the Hermite polynomials {H n {x)} are orthogonal in L 2 (R, SB, dq ( x )), 


e 


UV 



H n ( x) 2 dq(x ), 


f-oo H n (x) 2 dr](x) = n\ since exp(wv)= ( uv ) n /nl, and {H n (x)/Vn\} is an 

orthonormal sequence in L 2 (R, SB, dq(x)). 

The Taylor expansion of E[H m (X)H„(Y)] = f R2 H n (x)H n (y)<p(x, y; p)dxdy 
about p — 0 gives 


E[H m (X)H n (Y)] 


Z 


k = 0 


p k d k E[H m (X)H n (Y )] 
k\ dp k 


1/0 = 0 


P — (n'.) 2 8„ 


(B.48) 


where </>(•, •; p) denotes the density of (X, Y). The second equality holds since 
3 f(x,y; p)/dp — d 2 (/>(x, y; p)/( dxdy) and dH m (x)/ dx=mH m _\(x) by (B.46), 
so that integration by parts gives 

— I H m (x)H n (y)<i>(x,y, p)dxdy = mn [ H m -i(x)H n -\(y)<p(x, y\ p)dxdy. 
dp J R2 J R2 

The right side of this equation is 1 for m = n = \ and 0 otherwise at p = 0. Repeated 
differentiation of this equation gives d n E[H m (X)H n (Y)]/dp n | p = o = (n\) 2 S mn . A 

Theorem B.71 The collection of polynomials {H n (x)/*</n\, n = 0, 1, . . .} is an 
orthonormal basis in L 2 (R, 3B, dq(x)), that is, every function g in this space has a 
unique series expansion 


OO 

g(x) = ^ a n 

n = 0 


H„(x) 

\fn\ 


where a n = 


g(x)H n (x)dq(x) 


(B.49) 


and \\g\\ 2 = 
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Proof The essential steps of the proof can be found in [23] (Sect. 9.3 ). The expansion 
of g in (B.49) implies ||g|| 2 = a l (Theorem B.41). A 

Let X be a standard Gaussian variable and g : R — > R a measurable func- 
tion such that E[g(J f) 2 ] < oo. Then Y — g{X) e L 2 (R, SB, difx)) has the unique 
representation 


Y = g(X) = Y, a n 

n = 0 


H„(X) 

J~n\ 


(B.50) 


where the coefficients a n — E[g(X)H n (X)]/Vn T are given by (B.49). It is common 
in application to approximate Y = g(X) by truncations. 


r-rr=Z' 


H„(X) 


(B.51) 


of the infinite series in (B.50). Note that Zs[F 2 ] underestimates E[Y 2 ] by ZnZ+t a n 
since E[g{X) 2 ] = X,T=o a l - Z«=o a l = E i Y ?] and that the sequence [F r ] 
converges in m.s. to F, that is, E[(Y r — F) 2 ] — »■ 0 as r — > oo. 

Example B.44 Consider a lognormal random variable defined by Y — g(X) = 
exp(X), X ~ /V ( 0 , 1). Since g is measurable and E[Y 2 ] < oo, Y admits a represen- 
tation ofthe type in (B.50) with coefficients a„ = E[exp(X)H„(X)]/ a/hT = e 1 / 2 /\fn\ 
so that Y r — Zn = o e l / 2 H n (X)/n\ provides an approximation for F. We have 
Y r -» F in m.s. as r oc hv properties of Hermite polynomials. Direct calcula- 
tions show the convergence lim / _ j . 00 E ( Y 2 ] = /-’[F 3 ], so that £[F 3 ] can be approx- 
imated by E [ F 3 ] for a sufficiently large r [2], However, E [ F 3 ] may or may not 
provide satisfactory approximations for E\Y k ~\, k > 3, depending on the mapping 
Xi-> F = g (X), as illustrated by the following example. O 

Example B.45 Let F = g(X) = A|, where X ~ A'(0, 1). Since g is measurable 
and E[Y 2 \ < oo, Y admits an expansion of the type in (B.50) with coefficients 
am = (— l)"/(V2jr2” _1 (2n— l)n!) and a 2n +\ — 0, so that Y lr = 'E'n=o a 2 nH 2n (X) 
[2]. This approximation converges to F in m.s. and, therefore, in probability, 
as r -* oo. 

However, higher order moments of Y 2r may not converge to the corresponding 
moments of F. For example, the numerical sequence a 2r — E[(Y 2r — Ft ( — i > ) 4 ] 
diverges, that is, lim, -^oo a 2r — oo ([24], Sect. 3.4, Example 2). Hence, the sequence 
{Y 2r } is not Cauchy in L 4 (R, SS, drj(x)) so that ]F 4 r ] is not uniformly integrable 
([25], Theorem 6.6.2). Accordingly, the sequence of moments {Zs[F 4 r ]} does not 
converge to E[ F 4 ] ([26], Theorem 4.5.4). The practical implication of this result is 
that numerical approximations of E [ F 4 ] based on E[ F 4 ] are likely to be inaccu- 
rate irrespective of the order r of the approximation Y 2r of F. Similarly, tails of the 
distribution of F cannot be approximated by those of Y 2r . O 
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A slight generalization of the standard Hermite polynomials in (B.46) results by 
using the Gaussian measure dif(x) = (2 tt Y )~ 2 ' 2 exp(— x 2 /(2y))r/x, reK, corre- 
sponding to a Gaussian variable with mean 0 and variance y > 0 ([23], Sect. 9.3), 
rather than a Gaussian variable with mean 0 and variance 1 . For this measure, the 
Hermite polynomials are 

H n (x-y) = (-y) n e x2 l^ ) d ^ f ^ , xeR, (B.52) 

dx n 

or H n (x; y) = H n (u)y~ n l 2 with u—xy l/2 . The statements of Theorems B.70 
becomes 

E[H m (X- y x )H n (Y ; Yy )] =n\p n yT /2 Yy ,2 ^mn, (B.53) 


where (X, Y) is a bivariate Gaussian vector with E[X] = E[Y ] = 0, E[X 2 ] = y x , 
E[Y] = y y , and E\XY~\ = py^ 2 y^ 2 . Theorem B.71 implies that {//„(x; y)/ 
Jy n n\, n = 0, 1, . . .} is an orthonormal system in L 2 (R, S3, drf(x)) so that every 
function g in this space has the unique series expansion 


g(x) = y] an 


H n (x; y) 


where a n — 


and 


Z oo 
n = ( 


g(y)H n (y ■ y)dif(y), 

(B.54) 


The following identities, 


In/ 2] 

= z 


(2 k)\(n - 2k) 


(2k- m Y k H n _ 2k (x-y) 


H„+ t(x; y) — xH n (x; y) - ynH n - i(x; y) 
3 H n (x; y) 


dx 

dH n (x; y) 

3 y 


= nH n - i(x; y) 

1 3 2 H n (x; y) 

2 dx 2 


(B.55) 


are for Hermite polynomials corresponding to the Gaussian measure drj*(x) = 
(2jty)~ 1 ^ 2 exp(— x 2 /(2y))dx, xeR, and are useful in calculations. The symbol 
[n/2\ denotes the integer part of n/2 and 77 !! is equal to n ■ (n — 2) • • • 5 • 3 • 1 for 
77 > 0 odd, 77 • (77 — 2) • • • 6 • 4 • 2 for 77 > 0 even, and 1 for 77 = —1,0. 


B.6.2 Homogeneous Chaos 

Let [B(t), t > 0} be a Brownian motion on a probability space (G, .'X , P) and 
let L 2 [a, b] denote the Hilbert space of square integrable functions / : [a, b] -> 
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R, [a, b] C R, with respect to the Lebesgue measure, that is, f e L 2 [a, b] implies 
||/||= f" f(t) 2 dt < oo. The stochastic integral /(/) = Xf /(*) dB(t) is a Gaussian 
random variable on (Q. & , P ) with mean E[I(f)] — 0 and variance E[I(f) 2 ] = 
f!i f( { ) 2 dt = || /|| 2 (Sect. 4.4). The function co i->- J* f(t) dB(t, co), co e fi, is fp- 
measurable, but is also measurable with respect to a smaller a -field, the a -field 
.'¥ h = a (B(t) : a < t < b) generated by B. The review in this section follows devel- 
opments in [23] (Sect. 9.4). Useful information on this topic can also be found in 
[27] (Sect. 2.2) and [28] (Chap. 6). 

Denote by L 2 (Q, JP B , P) = L 2 B (Q.) the Hilbert space of P-square integrable func- 
tions on £2 that are -measurable. The stochastic integral 1(f) is a member of 
L 2 (Q, f ? B , P ) and defines a mapping I : L 2 [a, b ] -> L 2 (£2, f £ B , P) that is an 
isometry since the norms E[I(f) 2 ] = f(t ) 2 dt and ||/|| 2 = f(t ) 2 dt of 1(f) 
and / e L 2 [a, b] are equal (Sect. 4.4). 

Set /o = R- Let {J„,n > 1], denote the closures in L 2 (£2, fP B , P) of the 
linear space spanned by real- valued constant functions and products / (fi) • • ■ / (/*), 
fi, . . . , fk € L 2 [a, b], k = 1, . . . , n, respectively. By construction, we have 
Jo C J\ C • • • C J n C • • • C L 2 (Q, ^ B , P). Note that n > 1, are infi- 
nitely dimensional spaces. 

Theorem B.72 The union U/L , ./„ is dense in L 2 (£l, f? B , P). 

Proof See Theorem 9.4.5 in [23]. The theorem implies that an arbitrary member of 
L 2 (£2, . P) can be approximated by a sequence in U/L ( ./„ to the desired accu- 

racy. Also, the sequence {/„, n > 0} of subspaces in L 2 (£l, JF B , P) can he replaced 
with Kq = Jq — M and { K n , n > 1], where K n denotes the orthogonal complement 
of J n - 1 in J n so that J„ = J n -\ © K n can be represented by the orthogonal direct 
sum of J n -i and K n . ▲ 

Definition B.44 The members of K n are called homogeneous chaoses of order n. 

Theorem B.73 The representation L 2 (Q, f? B , P) = ©// () K n holds, so that each 
member <p e L 2 (Q, JP B , P) has a unique homogeneous chaos expansion 

OO 

<P= <pn , 4>,1 e K n , (B.56) 

n — 0 


and ||/|| 2 = Xi/TXo ll < / > «ll 2 ’ where || ■ || is the norm in L 2 B (£T). 

Proof See Theorem 9.4.7 in [23]. We illustrate the expansion in (B.56) by two 
elementary examples. 

If ip — a + /31(f), a, /3eR, f e L 2 [a, b], then cpeJi, <po = a e Kq, and 
<p\ — /31(f) e K\. Note that E[(cp — ct)c] = E[<p\c] = cj3E[I(f )] =0forc e Kq — M 
arbitrary, so that <p\ is orthogonal on Kq. 

If (p = I (f) 2 — || / 1| 2 , then cp e J 2 , E[cpc\ = 0 for c e Kq = R arbitrary implying 
<p i A'o, and £[(/>/ (g)] = E[I(f) 2 I (g)]-||/|| 2 £'[/(g)] = 0, g e L 2 [a, b], implying 
(p £ K\. The latter equality holds since E[I(g)] = 0 and E[I(f) 2 I(g)] — f[ ab p f(s) 
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f(t)g(u) dB(s ) clB(t) dB(u) = 0 by properties of multidimensional Wiener-Ito inte- 
grals discussed in the following section. Since 0 is a square of a stochastic integral, 
it must be in Kq U K\ U Ki_, so that <p = X« = o 0 h- Since <p has no projections on 
and K\, then = = 0 and 0 = </> 2 = 1(f) 2 - ||/|| 2 . A 

Example B.46 Let 1(f) = f f(t) dB(t) be an Ito integral, where / e L 2 [a, b] and 
B denotes a Brownian motion. Then, 1(f) 2 — \\f\\ 2 € K% is a polynomial chaos 
of order n = 2 that coincides with the double integral f b f b f(s) f(t) dB(s) dB(t), 
referred to as the Wiener-Ito integral (Sect. B.6.3). O 

Proof Set X (t) = f' a f(s) dB(s), or equivalently, dX (t) = f(t) dB(t) with X (a) = 
0, te[a,i]. The integral form of Ito’s formula in (Sect. 5.2) applied to X (t) 2 gives 
X(b) 2 = 2 f b X(t) dX(t)+ J a h f(t) 2 dt so that 2 J b f(t)X(t) dB(t) = I(f) 2 -\\f\\ 2 . 
Theorem B.78, stated in a subsequent section, implies 

fa fa /^)/(0 dB{s) dB(t) = 2 f b a f(t)[ f a f(s) dB(s )] dB(t) 
or 

fa fa mm dB(s) dB(t) = 2 f b f(t)X(t)dB(t ), 
so that we have 7(/) 2 - ||/|| 2 = f b f b f(s)f(t) dB(s) dB(t). A 

rb 

Example B.47 Let 1(f) = j a f(t) dB(t) as in the previous example. Then 

/(/)" = H n (l(f); ||/|| 2 ) + n{n ~ 1} //„_ 2 (/(/); ||/|| 2 ) + • • • (B.57) 

by the first identity in (B.55) with 1(f) and ||/|| 2 in place of x and y, 
respectively. <> 

Definition B.45 Let {ek, k= 1,2,...} be an orthonormal basis in l 2 \a, h] and 
consider a sequence {«£, k = 1, 2, . . .} of nonnegative integers with finite sum. Set 

OO 1 

= T7 -l= t H n^ 7 k) (B.58) 

* = l 

where ek = I (ek) = f b ek(t) dB(t). Since rik > 0 and 1 n k < 00 , the product 
in the defining formula for n2 ,... has a finite number of terms. 

Theorem B.74 The collection of functions {■XXf n 1 + «2 + = n) is an 

orthonormal basis for K n . 

Proof See Lemma 9.5.3 and Theorem 9.5.4 in [23]. The theorem states that any 
member of K n , that is, any polynomial chaos of order n, can be expressed as a linear 
form of w ith indices n\, n 2 , . . . such that n\ + «2 + • • ■ =n. 

For example, the orthonormal basis for Kq , K \ , K 2 , and are a real constant, 
{PIi(ek)}fork =1,2, ... , {Hi (ek)H\ (ei)} and {H 2 (ek)s/ 2 \} fork, l = 1, 2, . . . , k ^ 
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/, and [Hi(ek)Hi(ei)Hi(e q )}, {H 2 (h)H\{ei) / y/2\}, and {ff 3 (ei)/V 3!} for k,l, 
q= 1,2, , k ^ I ^ q, respectively. A 

Example B.48 Let f e L 2 [a, b] so that it admits the representation / = l a k e k 
with ak — (/, ek) — J'f f(x)e k (x)dx. Since 1(f) e K\, it admits the representation 
I(f) = Ya= \ a kl(ek)= Y!k = \ a kh= Z~= l ih) by Theorem B.74, where 
ek — I(ek). The representation is consistent with the fact that { ll\ (ek) = e k ] is an 
orthonormal basis for K\ (Theorem B.74). O 

TheoremB.75 The functions {J^ nin2 , ni +n 2 + • =n,n — 0, 1, ...} define an 

orthonormal basis for L 2 (£l, , P) so that every (p e L 2 (£2, , P) admits the 

representation 


oo 

*=Z Z (B.59) 

n = 0ni+n 2 +-=n 

where n\, n 2 , . . . > 0 and a, n ,n 2 ,... — E[tpJ(? n l , n2 ,...]- 

Proof See Theorem 9.5.7 in [23]. Note that L 2 (Q, P) is much larger than 
the space defined by measurable mappings of a Gaussian variable as considered in 
(B.50), and that the representation in (B.59) is a summation of Hermite polynomials 

{H nk (h)}, {H„ k (h)H m (e/), ...} such that n k > 0, in+n 2 -\ = w,andn>0, 

that is, 


OO OO 

<j> = ap + y, a].kH\(ek) + a 2 'kiH\(ek)Hi(ei ) 

* = 1 
OO 

+ ~y, a2,kkH 2 (ek)/s/V. + ■ ■ ■ (B.60) 

k= l 

where {e k } are independent Gaussian variables. In the expression of </>, the first term 
ao is a member of J^ lin2 ,... with » = 0, the second term XitL \ a \,kH\ (ek) includes 
members of with a single non-zero n k — 1,2,.. ., and the latter two terms 

2Su = u// & 2 ,klHi (h)H\ (ei) and X“= l a 2 ,kk H 2(h) / ^- consist of members of 
with n = 2. A 

Example B.49 Let /i : R 2 -> K be a measurable function, and X — h ( G \ , G 2 ), 
where G k , k= 1,2, are independent N( 0, 1). The random variables Gk can be 

rb 

viewed as two stochastic integrals I(fk) = J a f k (t ) dB(t) corresponding to two func- 
tions fk e L 2 [a , b] such that f k (t) 2 dt — 1, k— 1,2, and f\ (t)f 2 (t) dt — 0 
since E[I(f k )] = 0 and £[/(/i)/(/ 2 )] = f\(t)f 2 (t)dt (Sect. 4.4). The random 
variable X admits the approximate representation 
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Fig. B.l Samples of Bit ) 
using the first 600 terms of 
the series in (B.62) 



2 


-1.5 


0 


0.2 


0.4 


0.6 


0.8 


t 



(B.61) 


by retaining from (B.59) terms of order n < 2, where ao — E[h{G\, G 2 )], 
a k = E[h(G u G 2 )H 1 (G k )], k=l,2, a kk = E[h(G u G 2 )H 2 (G k )]/V 2!, *=1,2, 
and a n = E[h(G u G 2 )H l (G 1 )H 1 (G 2 )]. O 

Example B.50 Let <p(t) = B(t), 0 < t < 1, so that the coefficients a,,, .,,, of the 
representation in (B.59) are 0 for n — 0, /,[ e k (s) ds, k = 1, 2, . . ., for n = 1, and 0 
for n > 2, so that 



(B.62) 


where {e k (x)} is an orthonormal basis in L 2 [ 0, 1]. Figure B.l shows ten samples 
of Bit) obtained from the first 600 terms of the series representation in (B.62) with 
e k (t) = V2sin ((k — l/2)jtt^)). O 

Proof The coefficients in (B.59) are zero for n — 0 since they are equal to 

the expectation E[(p(t )] = E\ B(t)] = 0. The coefficients for n = 1 are the integrals 
/q e k [s) ds since, for example, 

«i,0,o,...(0 = E[<pit)E[\ie\)Hoie 2 )Eloie'i) • ■ ■ ] = E[cp(t )//i(ei)] 


= E 


I 

J[ 0 


[0,1] 2 


1(0 < u < t) dB(u)e\(s) dB(s) 


1(0 < u < t)e\is)E[dB{u) dB(s)] = 


e\ ( 5 ) ds, 


Appendix B 


515 


by using E[ dB(u) dB(s )] — 8(s — t) ds. Similar calculations show that = 0 

for n\ + rt 2 + • • • = n >2. The integrals in (B.59) for e k (0 = ~J2 sin ((A: — 1 /2)i rf) ) 
are Jq e^C?) <£? = [l — cos((A: — l/2)jrf )]/((£ — l/2)7r). A 

We conclude this section with a comment on the relationship between the Hermite 
polynomial expansion in (B.50) for square integrable functions in L 2 (M, 8$, if) and 
the polynomial chaos expansion in (B.59) for functions in L 2 (£2 , f? B , P) . The repre- 
sentation in (B.59) can be viewed as an infinite-dimensional analogue of the one- 
dimensional representation in (B.50). Truncated versions of the series in (B.59) 
provide approximations for the members of L 2 (C, 8${C), /x), where C denotes the 
Banach space of real- valued continuous functions defined on [0,1] and starting at 
zero, 8§{C) denotes the Borel a -field on C, and /x is the Wiener measure. This 
measure can be constructed by extending the set function 





(B.63) 


defined on the set 8% of cylindrical subsets A — [a>eC : (u>(t i), . . . , &>(?„)) e U] to 
the a-filed where 0 < t\ < ■ ■ ■ < t n < 1, tQ = uo = 0, and U e 8&(R") ([23], 
Theorem 3.1.1). 


B.6.3 Multiple Wiener-ltd Integrals 

We have seen in Example B.46 that the double integral f’ f(s)f(t) dB(s ) dB(t) is 
a polynomial chaos I ( / ) 2 — 1| / 1| 2 of order n — 2. Our objectives are to define multiple 
integrals of the type /(/i, ...,/*) = /* • • • /„ /i(0) • • • fkifk) dB{t\) ■ ■ ■ dB(t k ), 
k= 1,2,..., where f\,...,f k € L 2 [a, b] and B is a standard Brownian motion 
defined on (£2, , P), and show that these integrals are polynomials chaoses. 

The construction of I (/j , . . . , f k ) is conceptually similar to that of the stochastic 
integral in Sect. 4.4. However, developments in Sect. 4.4 do not extend directly to 
7(/i, . . . , f k ) since the expression of this integral for simple functions involves 
increments of the Brownian motion over the same time increments. 


B.6.3.1 Two-Dimensional Wiener-Ito Integrals 

Consider the stochastic integral hif) = J'f jX / (.S', t) dB(s) dB(t), where B denotes 
a standard Brownian motion and / e L 2 ([a, b] 2 ). We summarize the essential results 
in [23] (Sect. 9.2) on double Wiener-Ito integrals. 

Definition B.46 Let / : [a, b] 2 — »■ R and call 


f(s,t) = (f(s,t) + f(t,s))/2 


(B.64) 
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the symmetrization off. Note that ||/|| < ||/||, where ||/|| 2 = f(s, t) 2 ds dt 

is the norm in L 2 ([a, b] 2 ). 

Definition B.47 A function / : [a, b] 2 —> R. is called an off-diagonal step function 
if it has the expression 

n 

f(s,t)= X a.ij\({s,t)&[ti-\,ti) X [tj-i,tj)), (B.65) 

i,j = l,ijtj 

where a = to < 1 1 < ■ • • < f„_i < t n —b is a partition of [a.b] and a;/ e R. are 
constants. Off-diagonal functions vanish on the diagonal D ~{{s,t) e[a, b] 2 : s — t) 
of [a, b] 2 and the set of these functions is a vector space. 

Consider the linear operator 

n 

hif) = X (B.66) 

i,j = 1 ,ijtj 

defined on the set of off-diagonal functions. 

Theorem B.76 Iff is an off-diagonal function, then hif) — hif), E[hif)] — 0, 
and E[I 2 (f) 2 ] = 2 ^ J* f{s, t) 1 ds dt = 2\\f\\ 2 ([23], Lemmas 9.2.2 and 9.2.3). 

Theorem B.77 If f e L 2 {[a, b] 2 ), there exists a sequence {/,,} of off-diagonal step 
functions such that lim„_ >00 J* f ] ’ \f{s,t) — f n {s,t)\dsdt = 0 ([23], Lemma 9.2.4). 

Definition B.48 The m.s. limit / 2 (/)= lim„_ i , 00 hifn), f £ L 2 {Q, , P ), is 

called the double Wiener-lto integral of f where {/„} is the sequence in 
Theorem B.77. 

Theorem B.78 If f e L 2 ([a, b] 2 ), then hif) = hif), the first two moments of 
hif) are E[hif)] = 0 and E[hif) 2 ] = 2||/|| 2 , and hif) can be calculated 
sequentially by the formula f(s, t) dB(s) dB(t) = 2 [ J* / (s, t) dB(t)\ 

dB{s) ([23], Theorems 9.2.7 and 9.2.8). 

Example B. 51 The integral jj’ dB{s) dB(t), thatis, hif) for / = l,canbecalcu- 
lated as the limit of integrals hifn) of simple functions (Theorem B.77) or sequen- 
tially following Theorem B.78. 

Let a = to<t\ <■■■< t n — b be a partition of [a, b ] such that max i </<„ (h — 
f,-_ i) —> 0 as n — »■ oo. The sequences {/„} and {hifn)) are those in (B.65) and 
(B.66) with aij — 1. An alternative form of hifn) is 

n n n 

hifn) = X AB ‘ AB i - = ( B W ~ B - X (AB ' )2 ’ (B - 67) 

i,j = 1 i = 1 i = 1 

where A Bi=B(tj) — B(r,_i). By Theorem B.77, we have hif) — lim,,-^ 
hifn) = {Bib) — B(a))“ — {b — a), where the latter equality holds since the limit 
of j ( A B,) 2 is the quadratic variation of Brownian motion (Sect. 3.7.6. 1). 
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We now calculate the same integral by the sequential rule in Theorem B. 78. Since 
/ = / = l,wehave/ 2 (/)= /* /* dB(s) dB(t) = 2 J a [ / a dB(t)] clB(s). The inner 
integral is J a dB(s) = B(s) — 5(a) so that 


/ 2 (/) = 2 
= 2 


B(s)dB(s)-B(a) dB(s) 


B(s)dB(s ) - B(a)(B(b) - 5(a)) 


(B.68) 


The integral form of Ito’s formula applied to 5(f) 2 in the time interval [a,b\ 
gives B(b) 2 - B(a) 2 = 2 /* 5(f) r/5(f) - /* c/[5, 5](f) = 2 /* 5(f) r/5(f) - ( b-a ) 
(Sect. 3.7.6. 1 and Sect. 4.7), so that / 2 (/) = (5(h) - 5(a)) 2 -(b-a) by (B.68). O 


B.6.3.2 Multiple Wiener-Ito Integrals 

Stochastic integrals I k (f) = j kl ■ • ■ f(h , . . . , t k ) dB(t\) ■ ■ ■ dB(t k ), k>2, are 
constructed in the same manner as the two-dimensional Wiener-Ito integral consid- 
ered in the previous section. For details, see [23] (Sect. 9.6). 

As previously, let f (s)= 'Zi<i l ,...,i k <n a h,-Jk l { s e x "= t^V-i- by)), sG[a,b] k , 
be a step function corresponding to a partition a = to < t\ < ■ ■ ■ < t n =b of 
[a, b]. An off-diagonal step function is a step function with « (| , k =0 if i p = i q 
for p ^ q, that is, the function is 0 whenever the intervals [fq-i, fq), . . . , [tj k _ l , by) 
are not disjoint. For an off-diagonal step function/, set I k (f) — Xt<; 1 i k <n a h,~ dk 
]”[*_[ A 5; r , where A 5, r = 5(f, r ) — 5(f, r l ). The symmetrization of /is the function 
m,-..,Sk) = Q/k\)'Znf( %(i), . . . , s K (k)), where the summation is performed 
over all permutations tt of [1, . . . , k], so that ||/|| < (1 /k\) n ||/|| = ||/||. 

Following are facts that are given without proof and parallel properties of two- 
dimensional integrals. If / is an off-diagonal step function, then E[I k (/)] = 0 
and E[I k (f) 2 ] = k\ J [a b]k |/(si, . . . , s,0| 2 r/.si ■ ■ ■ ds k ([23], Lemma 9.6.3). 

If / e L 2 ([a, b ] k ) , there exists a sequence of off-diagonal step functions {f q } such 
thatlimg^oo j [ab]k \f(su . . . , s k )-f q (si , . . . , s k )\ 2 ds\ • • • ds k = 0 and {h(fp)} is 
Cauchy in L 2 (f2). The limit I k (f ) = lim^oo I k (f q ) in L 2 (£l) is well defined since 
it does not depend on the particular sequence [f q }, and is called the multiple Wiener- 
Ito integral of/([23]. Lemma 9.6.4). If / e L 2 ([a, b] k ) and g e L 2 ([a, b] 1 ), k^l, 
then E(I k (f)Ii(g)] = 0 ([23], Theorem 9.6.10). If / e L 2 ([a, b] n ), k > 1, then 
/„(/) e K n . If (j) e K n , there exists a unique symmetric square integrable function 
/defined on [a, b] n such that (p— /„(/) ([23], Theorem 9.7.1), which shows that 
multiple Wiener-Ito integrals are polynomial chaoses. 
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Adapted 
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Adapted process, 152, 156, 158, 167, 170, 178 
continuous, 158, 170 
Adapted stochastic process, 95, 143 
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Bessel’s inequality, 493, 494, 500 
Beta distribution, 224, 354, 404 
Beta random variable, 407 
Beta standard variable, 442 
Bilinear form, 340, 359, 397 
bounded, 400, 496 
continuous, 398, 436 
elliptical, 397, 398, 437, 496 
Bilinear functional, 272, 273, 275, 339 
Bochner’s theorem, 70-74, 78 
Borel n-field, 11, 398, 503, 509, 515 
generated by 

open balls, 308, 464 
stochastic process, 468 
topology, 461 

Borel-Cantelli lemma, 21, 22 
Borel measurable function, 20 
Borel measurable mapping, 49, 461, 468 
Borel measure, 1 1 
Borel sets, 11, 63, 70 
Boundary conditions 
deterministic, 387 
Dirichlet, 182, 439 
essential, 396 

homogeneous, 396, 448, 449 
natural, 253 
Neumann, 182 

Boundary value problem, 420 
deterministic, 396 
Dirichlet (q = 0), 184-188 
Dirichlet (q 0), 188-190 
mixed, 190-192 

stochastic, 221, 393, 394, 398-403, 427, 
436, 446 

Bounded convergence, 29 
Bounded linear operator, 488, 489, 498, 499, 
501 

Bounded operator, 498 

Bounded variables, 210-211 

Bounded variation, 79, 80, 81, 130, 131, 176 

Bounded linear operators, 498-500 

Brownian filtration, 140, 193 

Brownian motion, 62 

as a martingale, 138, 140, 169, 176 
continuous time martingales, 97-99 
geometric Brownian motion, 163, 166, 
170, 173, 253, 255, 305 
non-differentiable sample paths, 4 
quadratic variation, 98, 100, 105, 106/, 
148, 516 

reflected at two thresholds, 182, 183 
reflected at zero 


local time, 181 
Tanaka’s formula, 181-184 
unbounded variation, 98 
Brownian motion integrators 
stochastic integral, 136 
M’q, integrands, 137-138 
•Jf 2 , integrands, 138-142 


Cauchy-Schwarz inequality, 41, 46, 68, 298, 
340, 341, 385, 398, 400, 406, 503 
Hilbert spaces, 490, 491 
linear functional, 496, 497 
metric space, 476 
normed-linear spaces, 484-486 
normed spaces, 504 
second moment calculus, 76 
Cauchy sequence, 395, 399, 482 
in Banach space, 489 
in L 2 , 78, 138, 145 
in metric spaces, 487 
m.s. Riemann-Stieltjes integrals, 79, 83 
in probability, 29 
Central limit theorem, 29 
Central moment of order, 24 
Change of measure 

absolute continuity of measures, 32 
continuity of probability measure, 21 
density function, 32-36 
Girsanov’s theorem, 193-196 
Radon-Nikodym theorem, 31 
Chapman-Kolmogorov equation, 90, 93 
Characteristic function, 94 

integro-differential equation, 314 
Levy-Khintchine formula, 104, 105 
Liouville equation, 291 
random variables, 36-38 
Chebyshev’s inequality, 137, 229, 

309, 348 

m.s. convergence, 28 
Chebyshev polynomials, 235 
modified, 232, 428 
products of, 428 
Closed set, 479-481 
Coefficient of variation, 24 
Collocation method, 224, 267, 276 
Colored noise, 4, 109, 175, 177, 246 
Compact operator, 499, 502 
Compact sets, 479 — 48 1 
Compact support, 188, 395 
Compensated Poisson process, 100, 153 
compound process, 101, 142 
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Complete metric space, 308, 480, 483, 487 
Completeness, 143 
Complete sets, 480 
Completion theorem, 487 
Compound Poisson process, 62/, 65, 69, 78, 
79, 93, 94, 99-103 
and a-stable processes, 105 
and Brownian motion, 151, 281, 310 
covariance function, 108 
semimartingales, 146 
Conditional density function, 34 
Conditional distribution function, 34 
Conditional expectation 
change of fields, 38 
conditional distribution, 39 
conditional probability, 34—40 
defining relation, 40, 42 
probability with respect to ff-field, 35^-0 
Conditional probability, 13, 34 
Conjugate priors, 206 
Consistency condition, 64, 390 
Continuous in mean square, 76-77 
Continuous martingale, 140, 143 
Continuous stochastic process, 163 
Continuous time martingales, 95-97 
Brownian motion process, 97-99 
Levy processes, 103-107 
Poisson processes 

and compound Poisson processes, 
99-103 

white noise processes, 107-109 
Continuous time linear systems, 241 
Continuous time nonlinear systems, 248 
Continuum eigenfunction, 501 
Continuum eigenvalue, 501 
Contraction, in metric space, 482, 483 
Contraction factor, 482 
Convergence 

distribution, 28 
L p , 142, 445^446 

polynomial chaos representations, 361 
probability, 28, 145, 159 
probability one, 76, 89 
strong law of large numbers, 29 
weak law of large numbers, 29 
Convergence modes 

almost sure (a.s.) convergence, 28, 28/, 271 
bounded convergence, 29, 264 
central limit theorem, 29 
coordinate convergence, 482 
dominated convergence, 29, 41, 142, 508 
mean square (m.s.) convergence, 28, 101, 
136, 404 


Cauchy-Schwarz inequality, 406 
finite difference scheme, 392 
Ito integral, 133-134 
real-valued random function, 77 
monotone convergence, 29, 41 
monotone convergence theorem, 29, 33 
Skorokhod convergence, 308 
weak convergence, 272, 308, 497-498 
Correlation, 68, 78, 381 
Countable sample space, 14 
Covariance, 25, 38, 86 

Covariance function, 67, 69, 115, 124/ 125 f 
414 f 432 

in translation space, 119/ 
target covariance function, 118, 120, 321 
Covariance matrix, 25, 212 


Decomposition method 

Cholesky decomposition, 48, 339, 458 
Doob’s decomposition, 44, 45, 143 
Doob-Meyer decomposition, 143-144, 
145, 149 

Levy decomposition theorem, 104 
Levy-Khintchine formula, 104 
unique, 486, 492 
Degrading systems, 327 
Dense subspace, 487, 501 
Density (probability density) function, 32-36 
conditional random variable, 357 
Fourier pairs, 37 
Gaussian (normal), 33 
standard bivariate Gaussian vector, 35 
conditional, 34 

DEs. See Deterministic equations (DEs) 
Deterministic equations (DEs), 1, 3, 317, 361, 
402 

Differential equations 

characteristic function, 172, 173 
density, Fokker-Planck equation, 251, 253, 
288,313,316 
moments, 172 
Diffusion coefficient, 167 
Diffusion process, 167 

Directional wind speed, for hurricanes, 
211-213 

Dirichlet boundary value problem ( q = 0), 
184-188 

Dirichlet boundary value problem (q 0). 
188-190 

Dirichlet problem, 188 
Discrete random variable 
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D ( cont .) 

e-field generated by, 39, 40 
conditional expectation given by, 38^-2 
Discrete time linear systems, 239 
Distance, 476 

Distribution function, 32-36 
Gaussian, 34-35, 48, 214, 408 
non-Gaussian, 36, 48-50, 238 
multivariate translation distribution, 50 
Dominated convergence, 41 
Dominated convergence theorem, 29 
Doob-Dynkin lemma, 143, 184, 252, 437 
Doob maximal L 2 inequality, 47 
Doob-Meyer decomposition, 143 
Doob's decomposition, 44 
Double Wiener-Ito integral, 246, 515, 516 
Drift, 167 

Drift coefficient, 167, 490 
Dual space, 490 
Dynkin formula, 186 


E 

Eigenfunction, 83 

correlation function, 223, 463 
Karhunen-Loeve representation, 85 
Eigenvector, 354, 502 
Eigenvalue, 79, 83, 349 

correlation function, 223, 463 
differential equation, 383 
distribution, 352/ 

Karhunen-Loeve representation, 85 
Elliptical partial differential operator, 396 
Equivalent linearization method, 374-375 
Ergodic, 88 

ESROMs. See Extended stochastic reduced 
order models (ESROMs) 

Essentially bounded, 505 
Estimator, 50 

in Monte Carlo simulation, 50-53 
Euclidean norm, 169 
Euclidean space, 476 
Existence and uniqueness solutions 
Galerkin solutions, 432 
stochastic differential equations, 168 
Lax-Milgram theorem, 340 
strong solutions, 276 
weak solutions, 291, 359 
stochastic integral equation, 168, 178 
stochastic partial differential equations, 
381 

stochastic problems, 433 
Expectation operator, 22 
Exponential process, 193 


Extended stochastic reduced order models 
(ESROMs), 350, 427, 469, 470 


F 

Fatou’s lemma, 30 
Feynman-Kac formula, 189 
Filtered probability space, 43 
Filtration 

adapted stochastic process, 95, 143 
Brownian, 140, 193 
natural, 43 
right continuous, 33 
standard Brownian, 135, 140, 193, 382 
Finite difference method, 388 
Finite dimensional distributions, 17, 62-64 
Finite element solution, 187, 192, 239, 380, 
391 

Fixed point theorem. See Banach fixed point 
theorem 

Fokker-Planck equation, 251, 288 
Fourier coefficients, 492 
Fourier series, 120, 218 
Fourier transform, 122, 251, 289 
Fredholm integral equation, 450 
Fubini’s theorem, 22, 25, 26, 27, 54 
expectation element, 180, 313 
independence, 168 
integration order 

of physical and probability spaces, 246 
in measurable random function, 60, 80, 
172, 399 


G 

Galerkin method, 290, 357, 430, 431, 438-441 
Gamma density function, 53 
normal, 205, 259 
Gamma distribution, 349, 468 
Gamma function, 53 
Gaussian measure, 510 
Gaussian (normal) distribution, 216 
density, 33, 34-35 
linear transformations, 37 
multivariate, 49-50 
standard normal, 358 
Gaussian process 

Brownian motion, 168 
colored noise, 175 
Ornstein-Uhlenbeck process, 177 
stationary, 109-111 
white noise, 108, 109 

Gaussian random variable. See Gaussian 
(normal ) distribution 
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Gaussian variables, 203 

Bayesian method, 205-206 
frequentist method, 204-205 
Gaussian white noise, 166-177 
Geometric Brownian motion, 163, 166, 253 
Ito equation, 177 

stochastic differential equation, 282 
Girsanov theorem, 193-196 
Gram-Schmidt orthogonalization procedure, 
493, 502 

Green’s theorem, 397 
Gronwall’s inequality, 319, 385, 451 


H 

Hausdorff space, 478 

Hermite polynomial, 360, 362, 507-510 

Hermitian, 72 

Hilbert space, 490, 491 

basis and Fourier representations, 491^-95 
bounded linear operators, 498-501 
linear functionals, 496^-97 
spectral theory, 501 
weak convergence, 497 
Holder’s inequality {Holder}, 503, 504 
Homogeneous chaos, 358, 510-511 
expansion, 511 

Homogeneous Poisson process, 142, 211 
Homeomorphic spaces, 478 

I 

Inclusion-exclusion formula, 12 
Independence 

(7-field, 10, 19, 20 
events, 10, 18-21 
Gaussian field, 224 
probability space, 19 
random variables, 18, 313 
Inequalities 

Cauchy-Schwarz, 76 
Chebyshev, 28, 229 
Doob, 47 
Holder, 503, 504 
lensen, 23, 41 
Minkowski, 504 

Infinitely divisible characteristic function 
a-stable, 105, 310 
Levy-Khintchine formula, 104-105 
properties, 9, 23, 24 
Infinitesimal generator, 166 
Inner product, 490, 491, 496, 506 
Integrals of random variables 
Fatou's lemma, 30 


Lebesgue’s theorem, 30 
properties, 36, 37 
Isometric mapping, 478 
Isometric operator, 271 
Isometry, 137, 138 
Ito calculus, 305 

arbitrary semimartingales, 158 
continuous semimartingales, 156 
Ito formula 

multi-dimensional, 512 
one-dimensional, 52, 389 
R-valued semimartingales, 155 

arbitrary semimartingales, 158-162 
continuous semimartingales, 156-158 
Revalued semimartingales, 162-164 
and Stratonovich integrals, 164-165 
Ito calculus, applications, 165 
Girsanov’s theorem, 193-196 
random walk method, 184 

Dirichlet boundary value problem 
(q = 0), 184-188 

Dirichlet boundary value problem 
(q # 0), 188-190 

mixed boundary value problem, 
190-192 

stochastic differential equations, 166 
Gaussian white noise, 166-177 
semimartingale white noise, 178-181 
Tanaka’s formula, 181-184 
Ito integral, 132, 135, 145, 164 
See also Stochastic integral 
Ito isometry, 137, 138 
Ito stochastic differential equation, 165, 166 


J 

Jensen’s inequality, 23, 41, 96 
for random variables, 95 
Joint density function, 223 
Joint distribution, 17, 59 
Joint probability density function, 223 


K 

Karhunen-Loeve expansion, 84-86, 455^-59 
Kolmogorov’s continuity criterium, 64, 66-67 


L 

L I 2 (Q, F, P), 41, 65-68, 138, 339, 340, 358, 
359 

L p (Q, F, P), 30, 43 
See also Probability space 
L p spaces, 503, 504 


524 


Index 


L ( cont .) 

as normed spaces, 504-505 
useful inequalities, 503 
Laplace operator , 186 
Law of total probability, 13 
Lax-Milgram theorem, 340, 401, 497 
Lebesgue integral, 490 
Lebesgue measure, 11, 26, 81, 437 
deterministic BVPs, 401 
homogeneous chaos, 5 1 1 
random fields, 112 
real-valued functions, 485 
spectral density, 71 
Lebesgue’ s theorem, 30 
Legendre polynomials, 291, 293, 362 
Levy measure, 104, 105 
Levy process 

Levy decomposition, 104 
Levy-Khintchine formula, 104-105 
quadratic variation, 106/ 

Linear functionals, 498-499 

Linearity, 41, 486, 492 

Linear operator, 488, 497-498 

Linear spaces, 483-484 

See also Vector spaces 

Linear stochastic differential equation, 271 

Linear transfonnation, 37 

Lipschitz condition, 169 

Localizing sequence, 96 

Local martingale, 96, 147 

Local solution 

Feynman-Kac functional, 189 
random walk method, 184 
Schrodinger equation, 183 
Local time, 181 
Lyapunov exponent, 306 


M 

Marginal density function, 63, 92, 249, 283, 
284, 317/ 

homogeneous random field, 433 
Marginal distribution, 63, 87, 265, 468, 471/ 
Markov process, stationary, 92-93, 129 
Markov property, 167 
Markov random function, 90 
Markov chains, 90-92 

Chapman-Kolmogorov equation, 93 
Markov process, 92-93 
Martingale 

continuous time, 95, 97 
discrete time, 42—4-5, 95 
Doob decompositions, 44 


Doob inequality, 47 
Doob-Meyer decomposition, 143-145 
as integrators, 142, 144, 146 
Jensen inequality, 23, 41, 95 
stopped, 44, 96 
submartingale, 44, 96 
supermartingale, 95, 96 
variation and covariation, 148-152 
Mean square (m.s.) convergence, 28, 384, 390, 
392, 394 

Mean square error, 38, 374, 455 
Measurable function, 16-18, 25, 60 
Measurable sets, 11, 39, 65 
Measurable space, 10, 11, 16-18 
probability space, 43, 50, 51, 63 
Radon-Nikodym derivatives, 31, 42 
Metric space, 475-478, 487 
closed set, 479^181 
compact sets, 479^-81 
complete set, 479-481 
completion of, 487 
contraction, 482 
sequences, 481-482 
topology generated by, 477 
Microstructures, probabilistic models for, 222 
linear models 

dependent coefficients, 227-234 
independent coefficients, 223-227 
Minkowski inequality, 504 
Mixed boundary value problem, 190-192 
Modulus inequality, 41 

Moment of order, 24, 50, 173, 306, 320, 
325-327, 345, 351, 469 
marginal, 422 
Monotone convergence, 41 
Monotone convergence theorem, 29, 33 
Monotonicity, 41 
Monte Carlo simulation, 337, 345 
improved measure change, 50 
Fourier series, 120 

linear differential equations, 286, 342 
memoryless transfonnations, 48, 171 
non-Gaussian process and field, 113, 118/ 
238 

non-stationary Gaussian process and field, 
109, 119, 124, 230 
random variable, 203 

stationary Gaussian process and field, 103, 
123 

random fields, 112-113 
sampling theorem, 459, 460 
spectral representation, 81 
stochastic processes, 109-111 
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translation vector processes, 113-119 
m.s. convergence. See mean square 
convergence 

m.s. Riemann-Stieltjes integrals, 79 
Multiple Wiener-Ito integral, 515-517 
Multivariate Gaussian, 50 
Multivariate translation distribution, 50 


N 

Natural boundary, 253 

Neumann series, 300-303 

Neumann series method, 373, 449^150 

Noise induced transition, 311 

Noninformative density, 206 

See also Vague prior density 

Norm 

Euclidean, 485, 489 
operator, 488 

Normed linear spaces, 484 
basis and separability, 487 
metric spaces, completion of, 487 
operators, 488, 490 
Normed vector space, 485, 487 
Null set, 11, 380 
Nyquist sampling rate, 460 


O 

ODE. See Ordinary differential equation 
(ODE) 

Off-diagonal step function, 516-517 
Open ball, 16, 242, 308, 477, 478 
Open set, 59, 60, 477, 479, 398 
Operator 

bounded, 488, 489, 497, 498, 499, 502 
closed, 480, 481, 492 
compact, 499, 500, 502 
continuous, 489 
differential, 192, 395, 396 
isometric, 487 

linear, 41, 81, 83, 491, 494, 500, 502-504, 
506, 516 
matrix, 500, 502 
projection, 496, 499 
self-adjoint, 83, 498, 499, 502, 506 
Operator norm 

bounded operator, 498 
self-adjoint operator, 502, 506 
Optional stopping theorem, 47 
Ordinary differential equation (ODE), 
172-173, 180, 287 
finite differences, 387, 390, 394 


linear, 2, 5, 172-173, 215, 222, 245, 251, 
255, 273-274, 390, 394 
second order, 237 
stochastic, 3 
white noise, 5 

Ornstein-Uhlenbeck process, 168, 171-173, 
177, 255, 260, 277, 284 
Orthogonal functions, 40, 495, 508-509 
Orthogonal increments, 81-83, 457 
Orthogonal polynomials, 506, 507 
Hermite polynomials, 513 
homogeneous chaos, 510-511 
multiple Wiener-Ito integrals, 517 
two-dimensional Wiener-Ito integrals, 
515-517 

Orthogonal projection, 40 
Orthonormal basis, 359, 493, 495, 500, 502, 
508, 512 

Orthonormal sequence, 492, 498, 502, 508 
Orthonormal set, 494 

P 

Parallelogram law, 490, 491 
Parametric models, 464 

Karhunen-Loeve expansion, 457, 459, 463 
sampling theorem, 459, 460 
spectral representation, 457, 458, 459 
Parseval’s identity, 495 
Perturbation method, 269, 372 
Perturbation series, 368, 371, 448 
p-integrable martingale, 43 
See also Submartingale; Supermartingale 
Poincare-Friedrichs inequality, 398, 402 
Poisson process 

compensated, 153 

and compound Poisson process, 99-103 
quadratic variation, 106/, 148 
filtration, 95, 99 

Polynomial chaos, 362-376, 436 
Positive definite, 88, 475, 484, 490 
Posterior densities, 206, 209/ 214, 259 
Predictable process, 45, 96, 143, 144 
Predictable stochastic processes, 44, 96 
Probability density function, 32 
Probability distribution function, 17 
Probabilistic models, 201 

random functions, 214, 230 
multi-phase materials, 217 
probabilistic models for microstruc- 
tures, 221, 379 

uncertain parameters, systems with, 
215,216 
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P ( cont .) 

random variables, 203 
bounded variables, 210 
directional wind speed for hurricanes, 
211 

Gaussian variables, 203, 211 
translation variables, 206 
Probability measure, 11 
extension of, 15-16 
Probability space, 9-16 
ff-field, 10-11 
construction of, 14 

countable sample space, 14 
product probability space, 14-15 
sample space, 9-10 
Probability theory, essentials of 
characteristic functions, 36-38 
conditional expectation, 38 — 42 
density functions, 32-36 
discrete time martingales, 42-47 
distribution functions, 32-36 
expectation operator, 22-27 
independent events, 18-21 
measurable functions, 16-18 
Monte Carlo simulation, 47-53 
estimators, 50-53 
Gaussian variables, 48 
non-Gaussian variables, 48-50 
probability measure, 1 1-14 
extension, 15-16 
probability space, 9-1 1 

countable sample space, 14 
product probability space, 14-15 
Radon-Nikodym derivative, 31-32 
random elements, 16-18 
random variables 

convergence of sequences, 27-30 
sequence of events, 21-22 
Product probability space, 14-15, 339 
Product sample space, 14 
Pythagoras’ theorem, 492, 494, 495 


Q 

(/-dimensional Euclidean space, 478 
Quadratic variation, 148, 106 
Brownian motion, 98, 518 
compensated Poisson process, 100, 153 
Quadratic variation process, 100, 105, 126, 
171 

Quadratic variation and covariation 
integration by parts, 150 
polarization identity, 152 
Quantizers, 463^166 


R 

Radon-Nikodym derivative, 31-32, 42, 51 
See also Monte Carlo simulation algorithms; 
Spectral density 

Random field, 73, 79, 112, 220, 224, 436 
Random function 

finite dimensional distribution, 62-64 
Monte Carlo simulation, 109 

Gaussian stationary random functions, 
109-113 

non-stationary Gaussian processes, 
119-124 

translation vector processes, 113-119 
multi-phase materials, 219, 221 
data set, 219-222 
spherical harmonics, 218-219 
probabilistic models for microstructures, 
221 

dependent coefficients, 227, 235 
independent coefficients, 223, 235 
sample properties, 64-67 
second moment calculus, 76 
integrals, 79-80 

Karhunen-Loeve expansion, 83-85 
mean square continuous, 76-77 
mean square differentiable, 77-79 
spectral representation, 81-83 
second moment properties, 67-70 
stochastic processes, classes of, 85 
continuous time martingales, 95-109 
ergodic random functions, 88-90 
Gaussian random functions, 86 
independent increments, 93-95 
Markov random function, 90-93 
translation random functions, 86-88 
uncertain parameters, 215-217 
weakly stationary random functions, 70 
R-valued stochastic processes, 72-73 
R-valued random variables, 73-75 
R-valued random variables, 70-72 
Random variable 

arbitrary, 22, 23, 34, 40, 85 
characteristic function, 36-38, 94—95, 101, 
108, 136, 225 

continuous, 16, 64-65, 77, 96 
convergence, sequences of, 28/ 
density, 34, 93, 203, 288, 466 
discrete, 39 

distribution, 17, 32, 36, 48, 56, 260, 408, 
467, 473 

expectation, 22-23, 25, 36, 38-42, 51, 285, 
307, 347 

moment, 24, 67-70 
P-integrable, 401 
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standard deviation, 17, 220 
variance, 76 
Random vector 

characteristic function, 265, 269, 287, 313 
correlation, 87, 325-332 
covariance matrix, 38, 49, 331 
density, 35, 94 
distribution, 17, 62-63 
distribution function, 234 
expectation, 22, 25 
Gaussian, 34-35, 37, 49, 97, 438 
independence, 18, 313 
joint density function, 207-208 
joint distribution function, 17 
moments, 24-25, 344 
second moment properties, 67-70 
Random vectors, 25, 295, 341, 414 
Random walk method, 184 

Dirichlet boundary value problem (q = 0), 
184-188 

Dirichlet boundary value problem ( q 0), 

188-190 

mixed boundary value problem, 190-192 
Reliability method, 367, 368 
Residual spectrum, 501 
Resolvent, 501 
Riemann integral, 163, 175 
Riemann-Stieltjes integral, 79, 130-131, 143 
Riesz representation theorem, 498-499 
Right continuous filtration, 55 


S 

SAEs. See Stochastic algebraic equations 
(SAEs) 

Sample space, 9-10 
countable, 14 
Schauder basis, 487 
Schrodinger equation, 182, 184 
SDEs. See Stochastic differential equations 
(SDEs) 

Second moment calculus for processes 

expectation and mean square integrals, 76 
mean square continuity, 77 
mean square differentiation, 77-79 
mean square integration, 83 
spectral representation, 81, 109 
Second moment properties, 67-70 

stochastic process, 238, 241, 246, 248 
Self-adjoint operator, 83 
Semimartingale, 155-164 
Semimartingale white noise, 178-181 
Separability 

by a basis, 487 


topological space, 487 
Separable stochastic process, 66-67 
Separation of variables, 384 
Sequence of events, 21-22 
Sequences, 481, 482 
SEs. See Stochastic equations (SEsj 
Simple stochastic process, 60, 469 
Smolyak formula, 367 

Solution of stochastic differential equation, 
168, 301 

Space 

Banach, 487 
complete metric, 480 
Euclidean, 414, 485 
Hilbert, 490 

Spectral density function, 71, 82 
Spectral distribution, 71-74, 89, 457 
Spectral theory, 501 
Stochastic partial differential equations 
(SPDEs) 

SROMs. See Stochastic reduced order models 
(SROMs) 

Standard normal distribution, 225 
Standard normal random variable, 225 
State augmentation, 286, 287 
State space, 90-92 
Stationary distribution, 92 
Stationary increments, 97, 99, 103 
Stationary process 

in strong sense, 167, 170 
in weak sense, 167, 272 
Stochastic algebraic equations (SAEs), 346 
arbitrary uncertainty, 337 
general considerations, 339 
Monte Carlo method, 339, 342 
reliability method, 368 
stochastic collocation method, 363 
stochastic Galerkin method, 

359, 361 

stochastic reduced order model method, 
344-357 

small uncertainty, 370 

equivalent linearization, 374-375 
Neumann series, 374, 375 
perturbation series, 371, 372 
Taylor series, 369, 371 

Stochastic collocation method, 295, 299, 363, 
440^146 

Stochastic difference equations, 254 
conditional analysis, 260, 276 
Monte Carlo simulation, 260, 263 
perturbation series, 269 
stochastic Galerkin and collocation meth- 
ods, 267 


528 


Index 


S (cont.) 

stochastic reduced order models, 265, 276 
Taylor series, 267-269 

Stochastic differential equations (SDEs), 166, 
271 

additive noise, 263 
Brownian motion input 

diffusion process, 313, 329 
conditional analysis, 277 
conditional Monte Carlo simulation, 278, 
279 

definition, 166, 169 

existence and uniqueness of solution, 179 
geometric Brownian motion, 170, 255, 305 
Gaussian white noise, 166-177 
linear, 251 

Monte Carlo simulation, 276, 277 
multiplicative noise, 237, 255 
Neumann series, 300, 303 
perturbation, 300, 303 
semimartingale input, 5, 158 
white noise, 178-181 
solution, 317 
state augmentation, 287 
stochastic collocation method, 295, 299 
stochastic Galerkin method, 290, 294 
stochastic reduced order models, 289 
Stratonovich integral, 164-166 
Taylor series, 300-301 
Wong-Zakai theorem, 175 
Stochastic equations (SEs), 1 
deterministic coefficients 

discrete time linear systems, 239 
continuous time linear systems, 241, 
248 

continuous time nonlinear systems, 
248, 251 

Stochastic Galerkin method, 290-295, 
351-363 

factors affecting accuracy, 362 
Stochastic integral 

/ BdB and f NclN, 131-136 
associativity, 147 
Brownian motion integrators, 136 
Jfg integrands in, 137, 138 
■Jf 2 integrands in, 138-142 
Martingale integrators, 142-146 
Ito integral, 137 
preservation, 147 

quadratic variation and covariation pro- 
cesses, 148-152 

Riemann-Stieltjes integrals, 130-131 
semimartingale, 146, 148 


simple predictable integrand, 137-142 
Stratonovich integral, 132, 164 
Stochastic integral equation, 166 
Stochastic partial differential equations 
(SPDEs), 5, 379, 380-386 
discrete approximations of, 386-392 
Stochastic process 
adapted, 178 
classes of, 85 
continuous time, 95 
correlation, 70 
covariance, 69, 70, 74 
discrete time, 42, 95, 254 
distribution, 32 
expectation function, 22 
finite dimensional distributions, 62-64 
Gaussian, 86 
measurable, 26 
Poisson, 142, 151 
sample properties, 230 
second moment properties, 216, 227 
Stochastic process increment 

Brownian motion, 93, 98, 168, 307 
independent increments, 62, 69, 93, 107 
Gaussian, 4, 97, 97f, 164 
Markov process, 167 
orthogonal increments, 81-83, 457 
Poisson process, 69-70, 93 
characteristic function, 94 
stationary increments, 97 
stationary, independent increments, 93 
Stochastic reduced order models (SROMs), 
289, 290, 317-320, 393, 464-469 
Borel measurable mappings, 468 
extended stochastic reduced order models 
(ESROMs), 350-357 
Gamma random variable, 466 
uncertain dynamic systems, 317-320 
Stochastic stability, applications of, 304-310 
Stopping time, 46 
Stratonovich integral, 132, 133 
and Ito formula, 164-165 
Stratonovich stochastic differential equation, 
165 

equivalent Ito stochastic differential equa- 
tion, 175, 197 
Strong continuity, 76 
Strong convergence, 360 
Strong law of large numbers ( see also under 
Convergence), 29 
Strong Markov property, 167 
Strong solution (of stochastic differential 
equation), 167 
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Strong uniqueness (of stochastic differential 
equation), 167 

Sturm-Liouville equation, 506 
Submartingale, 43 
Supermartingale, 43 
Sup-metric, 477 
Symmetry, 475 


T 

Tanaka’s formula, 166, 181-184 
Target covariance function, 118, 120, 321 
Taylor series, 300-301 
Taylor series method, 371, 374, 450 
Time change 

of local martingale, 96 
simplest case, 240 

Time change formula Ito integrals, 164 
Total variation process, 148, 150 
Triangle inequality, 475, 484 
Transition density, 92 
Transition probability matrix, 90-92 
Translation process, 115, 118, 468 
Translation variables, 206 
Bayesian method, 208-209 
frequentist method, 207-208 


U 

Unbounded operator, 501 
Uniform distribution, 216 
Uniform integrability, 83 


V 

Vague prior density, 206 

See also Noninformative density 

Vector spaces, 483 

finite dimensional, 484, 485 
infinite dimensional, 485 
n-dimensional, 484, 486 
normed, 485, 487 
Version, 2, 6, 66, 277 


W 

Weak convergence, 308, 497 
Weak law of large numbers (see also under 
Convergence), 29 
Weak solution, 272, 339 
Weak uniqueness, 167 
Weierstrass theorem, 366 
Well posed 

boundary value problem, 396 
linear partial differential equations, 390 
SDPE, 386, 396 

White noise processes, 3-4, 79, 107-109, 
174-175 

Wiener-Askey polynomials, 362 
Wiener-Ito integrals 
multiple, 517 
two dimensional, 515-517 
Wiener measure, 515 
Wong-Zakai theorem, 175 


